Author : Vishnu Vijayakumar

Problem Statement

In the first part of this analysis we try to use the Airbnb and Zillow datasets to understand which zip codes would generate the most profit on short term rentals within New York City for 2 bedroom apartments.

We use this information in the later steps to predict the nightly prices of the said apartments in these zipcodes using various Linear and non-linear models such as simple Linear Regression,Random Forest,XGBoost and Neural Nets

Data Overview

Datasets

For this analysis two publicly available datasets from Zillow and AirBnB have been used:

  • Cost data: Zillow provides us an estimate of value for two-bedroom properties

  • Revenue data: AirBnB is the medium through which the investor plans to lease out their investment property. In this data we are able to see how much properties in certain neighborhoods rent out for in New York City

  • The data used for the price prediction part is a subset of the Airbnb data

Assumptions

For this analysis following are assumed(for Break Even analysis)

  • Occupancy rate as ~75%

  • Availability as 365 days

Install Packages

In [1]:
!pip install --upgrade pip
!pip install plotly
!pip install seaborn
!pip install matplotlib
!pip install matplotlib inline
!pip install folium
!pip install xgboost
!pip install keras
!pip install tensorflow==1.2.0 --ignore-installed
!pip install tensorflow
Requirement already up-to-date: pip in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (20.0.2)
Requirement already satisfied: plotly in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (4.6.0)
Requirement already satisfied: retrying>=1.3.3 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from plotly) (1.3.3)
Requirement already satisfied: six in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from plotly) (1.14.0)
Requirement already satisfied: seaborn in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (0.10.0)
Requirement already satisfied: scipy>=1.0.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from seaborn) (1.4.1)
Requirement already satisfied: pandas>=0.22.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from seaborn) (1.0.1)
Requirement already satisfied: numpy>=1.13.3 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from seaborn) (1.18.1)
Requirement already satisfied: matplotlib>=2.1.2 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from seaborn) (3.1.3)
Requirement already satisfied: pytz>=2017.2 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from pandas>=0.22.0->seaborn) (2019.3)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from pandas>=0.22.0->seaborn) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib>=2.1.2->seaborn) (2.4.6)
Requirement already satisfied: cycler>=0.10 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib>=2.1.2->seaborn) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib>=2.1.2->seaborn) (1.1.0)
Requirement already satisfied: six>=1.5 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from python-dateutil>=2.6.1->pandas>=0.22.0->seaborn) (1.14.0)
Requirement already satisfied: setuptools in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from kiwisolver>=1.0.1->matplotlib>=2.1.2->seaborn) (45.2.0.post20200210)
Requirement already satisfied: matplotlib in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (3.1.3)
Requirement already satisfied: numpy>=1.11 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib) (1.18.1)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib) (1.1.0)
Requirement already satisfied: python-dateutil>=2.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib) (2.4.6)
Requirement already satisfied: cycler>=0.10 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib) (0.10.0)
Requirement already satisfied: setuptools in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from kiwisolver>=1.0.1->matplotlib) (45.2.0.post20200210)
Requirement already satisfied: six>=1.5 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from python-dateutil>=2.1->matplotlib) (1.14.0)
Requirement already satisfied: matplotlib in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (3.1.3)
Requirement already satisfied: inline in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (0.0.1)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib) (2.4.6)
Requirement already satisfied: numpy>=1.11 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib) (1.18.1)
Requirement already satisfied: cycler>=0.10 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from matplotlib) (2.8.1)
Requirement already satisfied: setuptools in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from kiwisolver>=1.0.1->matplotlib) (45.2.0.post20200210)
Requirement already satisfied: six in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from cycler>=0.10->matplotlib) (1.14.0)
Requirement already satisfied: folium in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (0.10.1)
Requirement already satisfied: branca>=0.3.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from folium) (0.4.0)
Requirement already satisfied: numpy in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from folium) (1.18.1)
Requirement already satisfied: jinja2>=2.9 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from folium) (2.11.1)
Requirement already satisfied: requests in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from folium) (2.22.0)
Requirement already satisfied: six in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from branca>=0.3.0->folium) (1.14.0)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from jinja2>=2.9->folium) (1.1.1)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from requests->folium) (2019.11.28)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from requests->folium) (1.25.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from requests->folium) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from requests->folium) (2.8)
Requirement already satisfied: xgboost in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (1.0.2)
Requirement already satisfied: scipy in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from xgboost) (1.4.1)
Requirement already satisfied: numpy in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from xgboost) (1.18.1)
Requirement already satisfied: keras in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (2.3.1)
Requirement already satisfied: numpy>=1.9.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from keras) (1.18.1)
Requirement already satisfied: pyyaml in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from keras) (5.3)
Requirement already satisfied: scipy>=0.14 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from keras) (1.4.1)
Requirement already satisfied: h5py in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from keras) (2.10.0)
Requirement already satisfied: keras-applications>=1.0.6 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from keras) (1.0.8)
Requirement already satisfied: keras-preprocessing>=1.0.5 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from keras) (1.1.0)
Requirement already satisfied: six>=1.9.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from keras) (1.14.0)
ERROR: Could not find a version that satisfies the requirement tensorflow==1.2.0 (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 1.15.0rc1, 1.15.0rc2, 1.15.0rc3, 1.15.0, 1.15.2, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.1.0rc0, 2.1.0rc1, 2.1.0rc2, 2.1.0, 2.2.0rc0, 2.2.0rc1, 2.2.0rc2, 2.2.0rc3)
ERROR: No matching distribution found for tensorflow==1.2.0
Requirement already satisfied: tensorflow in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (2.1.0)
Requirement already satisfied: tensorboard<2.2.0,>=2.1.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (2.1.1)
Requirement already satisfied: astor>=0.6.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (0.8.1)
Requirement already satisfied: keras-applications>=1.0.8 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (1.0.8)
Requirement already satisfied: protobuf>=3.8.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (3.11.3)
Requirement already satisfied: termcolor>=1.1.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (1.1.0)
Requirement already satisfied: keras-preprocessing>=1.1.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (1.1.0)
Requirement already satisfied: absl-py>=0.7.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (0.9.0)
Requirement already satisfied: numpy<2.0,>=1.16.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (1.18.1)
Requirement already satisfied: opt-einsum>=2.3.2 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (3.2.1)
Requirement already satisfied: grpcio>=1.8.6 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (1.28.1)
Requirement already satisfied: wrapt>=1.11.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (1.11.2)
Requirement already satisfied: scipy==1.4.1; python_version >= "3" in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (1.4.1)
Requirement already satisfied: google-pasta>=0.1.6 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (0.2.0)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (0.34.2)
Requirement already satisfied: gast==0.2.2 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (0.2.2)
Requirement already satisfied: tensorflow-estimator<2.2.0,>=2.1.0rc0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (2.1.0)
Requirement already satisfied: six>=1.12.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorflow) (1.14.0)
Requirement already satisfied: requests<3,>=2.21.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow) (2.22.0)
Requirement already satisfied: setuptools>=41.0.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow) (45.2.0.post20200210)
Requirement already satisfied: google-auth<2,>=1.6.3 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow) (1.14.0)
Requirement already satisfied: markdown>=2.6.8 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow) (3.2.1)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow) (0.4.1)
Requirement already satisfied: werkzeug>=0.11.15 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow) (1.0.0)
Requirement already satisfied: h5py in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from keras-applications>=1.0.8->tensorflow) (2.10.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow) (1.25.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow) (2019.11.28)
Requirement already satisfied: idna<2.9,>=2.5 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow) (2.8)
Requirement already satisfied: rsa<4.1,>=3.1.4 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow) (4.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow) (0.2.8)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow) (4.1.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow) (1.3.0)
Requirement already satisfied: pyasn1>=0.1.3 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from rsa<4.1,>=3.1.4->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in c:\users\vishnu\anaconda3\anaconda\lib\site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow) (3.1.0)

Libraries

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import seaborn as sns
import folium
import warnings
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import explained_variance_score, mean_squared_error, r2_score
from statsmodels.stats.outliers_influence import variance_inflation_factor
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import xgboost as xgb
from keras import models, layers, optimizers, regularizers
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
warnings.filterwarnings('ignore')
Using TensorFlow backend.

Parameter setting

In [2]:
folder_path='C:/Users/Vishnu/Desktop/Data Folder'
In [3]:
os.chdir(folder_path)
path=os.getcwd()
costdata = "Zip_Zhvi_2bedroom.csv"          
revenuedata =  "listings.csv"
nbeds=2
availability=365
occupancy=0.75
city = ["New York"]

DATA MANIPULATION

Loading the data

In [4]:
def read_data(path,file):
    full_path = os.path.join(path,file)
    return pd.read_csv(full_path)


revenue_data=read_data(path,revenuedata)
cost_data=read_data(path,costdata)
airbnb_data=revenue_data.copy()
zillow_data=cost_data.copy()

Data Quality Check

Null value check for datasets

Airbnb data

Columns such as square_feet,weekly_price, monthly_price,notes,thumbnails etc have large number of missing values which can affect our analysis if we are to include these columns.

Zillow data

Columns which give the cost data for years from 1996 to 2010 have a considerable amount of values missing

In [5]:
airbnb_data.isnull().sum()
Out[5]:
id                                                  0
listing_url                                         0
scrape_id                                           0
last_scraped                                        0
name                                               16
                                                ...  
calculated_host_listings_count                      0
calculated_host_listings_count_entire_homes         0
calculated_host_listings_count_private_rooms        0
calculated_host_listings_count_shared_rooms         0
reviews_per_month                               10052
Length: 106, dtype: int64
In [6]:
zillow_data.isnull().sum()
Out[6]:
RegionID        0
RegionName      0
City            0
State           0
Metro         250
             ... 
2017-02         0
2017-03         0
2017-04         0
2017-05         0
2017-06         0
Length: 262, dtype: int64
Function to list columns in the datasets which have null values beyond a defined threshold value

This function helps in getting more concrete insights on the percentage of missing values in both the datasets. Set the threshold missing value limit in the cell below and the function would return the columns from the datasets which have a missing value % greater than the set limit

Set the data missing% threshold below

In [7]:
missing_percent_threshold=20
In [8]:
def missing_data_list(df):
    n=missing_percent_threshold
    missing_values=round(df.isnull().sum()/len(df) * 100,2)
    abnormal_cols= missing_values[missing_values>n]
    return abnormal_cols
In [9]:
missing_data_list(zillow_data)
Out[9]:
1996-04    29.76
1996-05    28.86
1996-06    28.86
1996-07    28.81
1996-08    28.79
1996-09    28.79
1996-10    28.79
1996-11    28.68
1996-12    28.68
1997-01    28.41
1997-02    23.62
1997-03    23.40
1997-04    23.40
1997-05    23.40
1997-06    23.37
1997-07    23.37
1997-08    22.29
1997-09    22.26
1997-10    22.26
1997-11    22.22
1997-12    22.18
1998-01    21.99
1998-02    20.37
1998-03    20.36
1998-04    20.85
1998-05    22.05
1998-06    21.92
dtype: float64
In [10]:
missing_data_list(airbnb_data)
Out[10]:
space                           28.69
neighborhood_overview           35.80
notes                           58.71
transit                         34.93
access                          44.38
interaction                     41.03
house_rules                     38.61
thumbnail_url                  100.00
medium_url                     100.00
xl_picture_url                 100.00
host_about                      38.21
host_response_time              33.46
host_response_rate              33.46
host_acceptance_rate           100.00
square_feet                     99.17
weekly_price                    87.72
monthly_price                   89.27
security_deposit                35.42
cleaning_fee                    21.77
first_review                    20.56
last_review                     20.56
review_scores_rating            22.54
review_scores_accuracy          22.62
review_scores_cleanliness       22.59
review_scores_checkin           22.66
review_scores_communication     22.61
review_scores_location          22.66
review_scores_value             22.66
license                         99.96
jurisdiction_names              99.97
reviews_per_month               20.56
dtype: float64

Data Cleaning

Zillow Dataset: Since there is a considerable amount of missing values present in the Zillow dataset, only the last 1 year of cost data has been considered for the analysis.

In [11]:
zillow_df=zillow_data.loc[zillow_data['City'].isin(city)]
zillow_df=pd.concat([zillow_df[['RegionName','City', 'State', 'Metro', 'CountyName', 'SizeRank']],zillow_df.iloc[:,-12:]],axis = 1)
airbnb_df=airbnb_data.loc[airbnb_data['bedrooms']==nbeds]
zillow_df.RegionName = zillow_df.RegionName.astype(str)
airbnb_df=airbnb_df.dropna(subset=['zipcode'])
airbnb_df.zipcode=airbnb_df.zipcode.astype(float)
airbnb_df.zipcode=airbnb_df.zipcode.astype(int).astype(str)
Function to merge datasets
In [12]:
def merge_data(cost_data,price_data,costdata_col,pricedata_col):
    merged_data = cost_data.merge(price_data, how='inner', left_on=costdata_col, right_on=pricedata_col)
    merged_data=merged_data.drop_duplicates()
    return merged_data
In [13]:
final_data=merge_data(zillow_df,airbnb_df,'RegionName','zipcode')
final_data.groupby(['RegionName','neighbourhood_group_cleansed']).size()
Out[13]:
RegionName  neighbourhood_group_cleansed
10003       Manhattan                       136
10011       Manhattan                       106
10013       Brooklyn                          1
            Manhattan                       104
10014       Manhattan                        90
10021       Manhattan                        28
10022       Manhattan                        71
10023       Manhattan                        78
10025       Manhattan                       124
10028       Manhattan                        73
10036       Manhattan                       147
10128       Manhattan                        65
10303       Staten Island                     4
10304       Staten Island                     3
10305       Staten Island                    12
10306       Staten Island                     2
10308       Staten Island                     2
10309       Staten Island                     1
10314       Staten Island                     2
11201       Brooklyn                         85
11215       Brooklyn                        189
11217       Brooklyn                        124
11231       Brooklyn                         93
11234       Brooklyn                          9
11434       Queens                           16
dtype: int64
Data issue :

Its observed that one instance of the zipcode '10013' is erroneously mapped to Brooklyn in the dataset. On examining the neighbourhood and location of this 10013-Brooklyn data, it is found that it belongs to 'Bergen Beach' neighbourhood which is actually in Brooklyn. But remapping the zipcode at this point would result in errors when the prices of the apartments are calculated as the prices are already merged from zillow. So this one erroneous record of Bergen Beach, Brooklyn with zipcode '11234' can be removed, given the complexity and time constraint involved in fixing it otherwise.

*Several such cases can be found in the original airbnb data

In [14]:
final_data[((final_data.RegionName == '10013') & (final_data.neighbourhood_group_cleansed == 'Brooklyn'))][['neighbourhood','longitude','latitude']]
Out[14]:
neighbourhood longitude latitude
1400 Bergen Beach -73.90162 40.61807
In [15]:
ind=final_data[((final_data.RegionName == '10013') & (final_data.neighbourhood_group_cleansed == 'Brooklyn'))].index
final_data=final_data.drop(ind)
final_data.groupby(['RegionName','neighbourhood_group_cleansed']).size()
Out[15]:
RegionName  neighbourhood_group_cleansed
10003       Manhattan                       136
10011       Manhattan                       106
10013       Manhattan                       104
10014       Manhattan                        90
10021       Manhattan                        28
10022       Manhattan                        71
10023       Manhattan                        78
10025       Manhattan                       124
10028       Manhattan                        73
10036       Manhattan                       147
10128       Manhattan                        65
10303       Staten Island                     4
10304       Staten Island                     3
10305       Staten Island                    12
10306       Staten Island                     2
10308       Staten Island                     2
10309       Staten Island                     1
10314       Staten Island                     2
11201       Brooklyn                         85
11215       Brooklyn                        189
11217       Brooklyn                        124
11231       Brooklyn                         93
11234       Brooklyn                          9
11434       Queens                           16
dtype: int64

The missing data list obtained using the missing_data_list function helps in deciding if a column needs to be retained/dropped/transformed for our analysis. For this analysis columns only relevant columns required for the analysis have been retained. The missing value issues can be taken up at a later stage

In [16]:
final= pd.concat([final_data.iloc[:,0:18],final_data[['id','neighbourhood_group_cleansed','city', 'state', 'latitude', 'longitude', 'price', 'weekly_price',
                          'monthly_price', 'security_deposit', 'cleaning_fee', 'availability_365', 'number_of_reviews','host_id','host_is_superhost']]],axis=1)
Data issue :

The price values are found to have '$' and ',' which needs to be removed before the computation steps which follow.

In [17]:
final[['monthly_price','weekly_price','security_deposit','price','cleaning_fee']].head()
Out[17]:
monthly_price weekly_price security_deposit price cleaning_fee
0 NaN NaN $350.00 $95.00 $100.00
1 NaN $1,100.00 $200.00 $165.00 $85.00
2 $4,400.00 $1,100.00 $250.00 $250.00 $100.00
3 $15,000.00 NaN NaN $2,000.00 NaN
4 NaN NaN $0.00 $300.00 $100.00
Function to clean the price columns in the final dataset
In [18]:
def cleanprices(row):
    row_str=row.astype(str)
    row_replace=row_str.str.replace(',', '')        
    row_clean=row_replace.str.replace('$', '')
    row_clean=row_clean.astype(float)
    return row_clean

final[['monthly_price','weekly_price','security_deposit','price','cleaning_fee']]=final[['monthly_price','weekly_price','security_deposit','price','cleaning_fee']].apply(cleanprices)

Data Quality insights and actions performed

  • Missing values : As mentioned above there are quite a few columns with substantial amount of missing values including a few which could have otherwise played an important part in the analysis:

    • cleaning_price
    • security_deposit
    • review_scores_rating
    • square_feet
    • price : There are a few properties which have their prices listed as 0. However for our current analysis this is not a scenario as they get filtered out when we only analyse two bedroom apartments
  • Some zipcodes had length other than 5 in the original dataset, eventhough such cases are not present in the scenario considered for the analysis

  • Timeline issue: The airbnb data was last scraped in the third quarter of the FY 2019 which is not the case with Zillow data which provides property cost prices only uptill June 2017. We could forecast these recent cost prices using a time series model for a more robust analysis but this has been proposed as a future step. However, in the current analysis the cost prices have been assumed to be latest prices as time value of money discount rate is assumed to be 0%

  • Data cleaning was performed by

    • removing inconsistencies : zipcodes of abnormal length, incorrectly mapped zipcodes
    • conversion into suitable data type : There are certain columns which were reformatted for calculation,data type had to be changed to suitable types.
    • cleaning price columns : Price columns had '$' and '' which had to be removed for computational analysis
In [19]:
final= final.rename(columns={'RegionName': 'Zipcodes', 'neighbourhood_group_cleansed': 'Neighbourhood'})
print("The Final data dimensions are "+ str(final.shape))
The Final data dimensions are (1564, 33)

Exploratory Data Analysis

Price Distribution Analysis

Table showing number of listings by Neighbourhood, Population(SizeRank) Average Price and No of listings in each ZipCode

Column metadata
  • Avg_Price - Average daily price calculated for the apartments in each zipcode
  • No.of listings - Count of apartments in each zipcode
In [20]:
final_analysis_1=final.groupby(['Neighbourhood','Zipcodes','SizeRank']).agg({'price':'mean','id':'nunique'}).rename(columns={'price': 'Avg_price','id':'No.of listings'}).round({'Avg_price':0}).sort_values(['SizeRank','No.of listings'],ascending=True).reset_index()
final_analysis_1
Out[20]:
Neighbourhood Zipcodes SizeRank Avg_price No.of listings
0 Manhattan 10025 1 320.0 124
1 Manhattan 10023 3 288.0 78
2 Manhattan 10128 14 253.0 65
3 Manhattan 10011 15 369.0 106
4 Manhattan 10003 21 319.0 136
5 Brooklyn 11201 32 244.0 85
6 Brooklyn 11234 52 135.0 9
7 Staten Island 10314 68 73.0 2
8 Brooklyn 11215 71 182.0 189
9 Manhattan 10028 109 274.0 73
10 Manhattan 10021 190 258.0 28
11 Manhattan 10014 379 345.0 90
12 Manhattan 10036 580 339.0 147
13 Queens 11434 622 137.0 16
14 Staten Island 10306 668 118.0 2
15 Manhattan 10022 894 349.0 71
16 Brooklyn 11217 1555 234.0 124
17 Manhattan 10013 1744 401.0 104
18 Brooklyn 11231 1817 208.0 93
19 Staten Island 10304 1958 93.0 3
20 Staten Island 10305 2087 132.0 12
21 Staten Island 10309 3682 85.0 1
22 Staten Island 10308 4149 110.0 2
23 Staten Island 10303 4647 104.0 4
In [21]:
final.price.describe().round()
Out[21]:
count    1564.0
mean      285.0
std       255.0
min        50.0
25%       165.0
50%       228.0
75%       321.0
max      4000.0
Name: price, dtype: float64
Observation
  • It is observed from the summary that there are some extremely high values for prices which need to be investgated

Boxplots of Price by Zipcode

In [22]:
plt.figure(figsize=(20,12))
bplot = sns.boxplot(y='price', x='Zipcodes', 
                 data=final, 
                 width=0.5,
                 palette="colorblind")
bplot.axes.set_title("Boxplots of Listing Prices by ZipCodes in NY without Outlier Treatment",fontsize=16)
bplot.set_xlabel("Zipcodes", fontsize=14) 
bplot.set_ylabel("Price",fontsize=14)
bplot.tick_params(labelsize=12)
Observation
  • Its observed that some zipcodes such as 10003 and 11217 have some abnormally high prices which should be capped

Capping outliers

Function to cap outliers in prices that are abnormally high or low

In [23]:
def outlier(col):
    percentiles = col.quantile([0.01,0.99]).values
    col = np.clip(col, percentiles[0], percentiles[1])
    return col
    
final[['price']]=final[['price']].apply(outlier)
In [24]:
final.price.describe().round()
Out[24]:
count    1564.0
mean      277.0
std       195.0
min        68.0
25%       165.0
50%       228.0
75%       321.0
max      1379.0
Name: price, dtype: float64
Observation
  • The range of price is now between 60-1500 and the extremely high values have been capped to 99% percentile value.
In [25]:
plt.figure(figsize=(20,12))
bplot = sns.boxplot(y='price', x='Zipcodes', 
                 data=final, 
                 width=0.5,
                 palette="colorblind")
bplot.axes.set_title("Boxplots of Listing Prices by ZipCodes in NY without Outlier Treatment",fontsize=16)
bplot.set_xlabel("Zipcodes",fontsize=14)
bplot.set_ylabel("Price",fontsize=14)
bplot.tick_params(labelsize=12)

Price Distribution by Zipcode and Neighbourhood

Column metadata
  • price - Mean price of listings for each zipcode
In [26]:
grouped_values=final.groupby(['Neighbourhood','Zipcodes']).agg({'price':'mean'}).sort_values(by='price',ascending=False).reset_index()
plt.figure(figsize=(15,12));
bar=sns.factorplot(x='Zipcodes',y='price',hue='Neighbourhood',data=grouped_values,
                        size=6,  aspect=2,
                        kind='bar', 
                        dodge=False,legend_out=False);
bar.ax.set_title("ZipCodes by Average Price of the listings in that ZipCode",fontsize=16);
bar.set_xlabels("ZipCodes",fontsize=14).add_legend(title='Neighbourhood');
bar.set_ylabels("Daily Price of the listings",fontsize=14);
<Figure size 1080x864 with 0 Axes>
Observation
  • Manhattan is observed to have highest priced listings followed by Brooklyn , Staten Islands and Queens

Cost Distribution Analysis

Column metadata
  • Average Cost - Average property cost of last 12 months from 2016-07 to 2017-06
In [27]:
cost_trend=pd.concat([final[['Zipcodes','Neighbourhood']],final.iloc[:,6:18]],axis = 1)
cost_trend=cost_trend.melt(id_vars=['Zipcodes','Neighbourhood'],var_name="Year",value_name="Median Cost")
cost_trend=cost_trend.drop_duplicates()
cost_trend.Zipcodes=cost_trend.Zipcodes.astype(float).astype(int).astype(str)
In [28]:
df = cost_trend
fig = px.line(df, x="Year", y="Median Cost", color="Zipcodes", facet_row="Neighbourhood",width=800,height=800)
fig.for_each_annotation(lambda a: a.update(text=a.text.replace("Neighbourhood=", "")))
fig.for_each_trace(lambda t: t.update(name=t.name.replace("Zipcodes=", "")))
fig.update_layout(showlegend=True)
fig.update_xaxes(ticks="inside",nticks=20,showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(ticks="inside", col=1,showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_layout(legend=dict(x=0, y=-0.2))
fig.update_layout(legend_orientation="h")
fig.update_traces(hoverinfo='text+name', mode='lines+markers')
fig.update_layout(
     title={
        'text': "Cost Trend of the Properties by ZipCode for last 12 Months",
        'y':0.96,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    
xaxis_title="Year",
yaxis_title="Median cost",
font=dict(family="Courier New, monospace",size=10,color="#7f7f7f"))
fig.show()
  • Since the median costs does not vary much over a time frame of past 12 months, inorder for accurate results and to take the recency factor into account, an average of past 6 months cost data has been considered for analysis

Cost Distribution by Zipcode and Neighbourhood

Column metadata
  • avg_cost - Average property cost of last 6 months from 2017-01 to 2017-06 to take the recency factor into account and for more accurate results
In [29]:
avg_cost_latest = zillow_df.copy()
avg_cost_latest = avg_cost_latest.loc[:, ['RegionName']]
avg_cost_latest['avg_cost'] = zillow_df[zillow_df.columns[-6:]].mean(axis =1)
In [30]:
final_data_cost=merge_data(avg_cost_latest,final,'RegionName','Zipcodes')
In [31]:
grouped_values_cost=final_data_cost.groupby(['Neighbourhood','Zipcodes']).agg({'avg_cost':'mean'}).sort_values(by='avg_cost',ascending=False).reset_index()
plt.figure(figsize=(15,12))
bar=sns.factorplot(x='Zipcodes',y='avg_cost',hue='Neighbourhood',data=grouped_values_cost,
                        size=6,  aspect=2,
                        kind='bar', 
                        dodge=False,legend_out=False)
bar.ax.set_title("ZipCodes by Average Cost of the listings in that ZipCode",fontsize=16)
bar.set_xlabels("ZipCodes", fontsize=14).add_legend(title='Neighbourhood')
bar.set_ylabels("Average Cost of the listings",fontsize=14)
Out[31]:
<seaborn.axisgrid.FacetGrid at 0x17b4d04a2c8>
<Figure size 1080x864 with 0 Axes>
Observation
  • Similar to the price listings, Manhattan is observed to have the highest cost of properties followed by Brooklyn and Staten Islands

Cost Benefit Analysis

Column metadata
  • Normalized_price - Average price for listings aggregated based on zipcodes normalized for quadrant analysis
  • Normalized_cost - Average cost for listings aggregated based on zipcodes normalized for quadrant analysis

The Zipcodes have been split across four quadrants to identify profitable ones

High cost High return zipcodes: These are the Zipcodes that are profitable but do not make very high profits due to the high costs involved at the time of purchase. Most of the Manhattan properties fall in this category Example ZipCodes:

10014
10013
10011

Low Cost High return zipcodes:These are highly recommended ZipCodes as they make high profits with lesser investment. However not many zipcodes fall into this category Example zipcodes:

10025
10036

Low cost Low return zipcodes:These Zipcodes are recommended after the ones in the Second Quadrant. they involve lesser cost and lesser or higher price/revenue. Example zipcodes:

11434
10306
10305

High cost Low return zipcodes: There are no ZipCodes this belong to this Quadrant

In [32]:
final_data_cost['Normalized_price'] = (final_data_cost.price-final_data_cost.price.mean())/final_data_cost.price.std()
final_data_cost['Normalized_cost'] = (final_data_cost.avg_cost-final_data_cost.avg_cost.mean())/final_data_cost.avg_cost.std()
In [33]:
df = final_data_cost
grouped_values_bubble=final_data_cost.groupby(['Neighbourhood','Zipcodes']).agg({'Normalized_cost':'mean','Normalized_price':'mean'}).reset_index()
fig = px.scatter(grouped_values_bubble, x="Normalized_cost", y="Normalized_price", color="Neighbourhood",
                 hover_data=['Zipcodes'],size_max=60,width=800, height=600)
fig.for_each_trace(lambda t: t.update(name=t.name.replace("Neighbourhood=", "")))
fig.update_traces(mode="markers",marker=dict(size=12))
fig.update_layout(
     title={
        'text': "Cost Benefit Analysis of Zipcodes",
        'y':0.96,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    
    xaxis_title="Normalized Cost",
    yaxis_title="Normalized Price",
    font=dict(
        family="Courier New, monospace",
        size=12,
        color="#7f7f7f"))

fig.update_layout(autosize=False,width=800,height=600)
fig.add_shape(
    go.layout.Shape(
            type="line",
            x0=0,
            y0=2,
            x1=0,
            y1=-2,
            line=dict(
                color="grey",
                width=3,dash="dashdot"
            )
))
fig.add_shape(
        go.layout.Shape(
            type="line",
            x0=-3,
            y0=-0,
            x1=3,
            y1=0,
            line=dict(
                color="grey",
                width=4,
                dash="dashdot",
            ),
    ))

fig.show()

BreakEven Analysis

Column metadata
  • Breakeven_years - Time required in years to breakeven
  • Avg_price - Average price of listings aggregated over zipcodes
  • Mean cost - Average property cost aggregated over zipcodes for the latest 6 months of data (2017-01 to 2017-06)

Top Zipcodes by breakeven period

The break-even point is the point at which total cost and total revenue are equal. There is no net loss or gain. At break-even point revenues are equal to exactly all of the expenses on a single income statement prepared under the accrual method of accounting.

The break-even point for this business case can be determined as given below

BreakevenPoint = AssetCost / (DailyPrice∗availability∗OccupancyRate)

In [34]:
breakeven_analysis =final_data_cost.groupby(['Neighbourhood','Zipcodes']).agg({'price':'mean','avg_cost':'mean'}).rename(columns={'price': 'Avg_price','avg_cost':'Mean_cost'}).round({'Avg_price':2,'Mean cost':2}).reset_index()
breakeven_analysis['Breakeven_years']= round(breakeven_analysis.Mean_cost/(breakeven_analysis.Avg_price*availability*occupancy),2)

The below graph shows the breakeven points(in years) sorted from lowest to highest based on the neighbourhoods.

In [35]:
breakeven_analysis["Zip-Neighbourhood"] = breakeven_analysis["Zipcodes"].map(str)+ '-'+breakeven_analysis["Neighbourhood"].str[:1]
fig = px.bar(breakeven_analysis,  x="Zip-Neighbourhood", y="Breakeven_years",
             hover_data=['Neighbourhood'], color='Neighbourhood',
             height=500,width=1000)
fig.for_each_trace(lambda t: t.update(name=t.name.replace("Neighbourhood=", "")))
fig.update_xaxes(categoryorder="total ascending")
fig.update_layout(
     title={
        'text': "BreakEven point analysis by Zipcode",
        'y':0.96,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    
    xaxis_title="Zip-Neighbourhood",
    yaxis_title="BreakEven Time in Years",
    font=dict(
        family="Courier New, monospace",
        size=12,
        color="#7f7f7f"))
fig.update_layout(legend_orientation="v")


fig.show()
Observation
  • It is observed that the zipcodes 11434,10306,10303,10305,10304 are the ones that recoup the entire money spent on the property the earliest. It should be noted that most of the zipcodes in the Staten Island neighbourhood breaks even early as when compared to rest.

Breakeven Analysis by Zipcode on Map

Column metadata
  • Avg_price - Average price of listings aggregated over zipcodes
  • Mean_cost - Average cost of listings aggregated over zipcodes for the latest 6 months of data (2017-01 to 2017-06)
  • Latitude - Average of Latitude values of all listings in a zipcode to get an approximate latitude value for the zipcode
  • Longitude - Average of Longitude values of all listings in a zipcode to get an approximate longitude value for the zipcode

ZIP Codes with bargraphs isn't a very intuitive way to present results by itself. Presenting the the same on a map would help the real estate company visually understand the results better.

The size of the bubble is inversely proportional to the breakeven value in years, The larger the size of the bubbles the shorter is the time to breakeven and vice-versa. The bubbles can be clicked to see all the information associated with each point.

In [36]:
map_data =final_data_cost.groupby(['Neighbourhood','Zipcodes']).agg({'price':'mean','avg_cost':'mean','latitude':'mean','longitude':'mean'}).rename(columns={'price': 'Avg_price','avg_cost':'Mean_cost','latitude':'Latitude','longitude':'Longitude'}).round({'Avg_price':2,'Mean cost':2}).reset_index()
map_data['Breakeven_years']= round(map_data.Mean_cost/(map_data.Avg_price*availability*occupancy),2)
folium_map_Breakeven = folium.Map(location=[40.67, -74],
                        zoom_start=11,
                        tiles="CartoDB dark_matter")
for index,row in map_data.iterrows():
    popup_text = "Zipcode: {}<br> Neighbourhood: {}<br> Breakeven years: {}"
    popup_text = popup_text.format(row["Zipcodes"],row["Neighbourhood"],row["Breakeven_years"])
    if row["Neighbourhood"]=='Manhattan':
        color="#FFCE00" # orange
    elif row["Neighbourhood"]=='Queens':
        color="#E37222"
    elif row["Neighbourhood"]=='Staten Island':
        color="#0375B4" # blue
    else :
        color="#0A8A9F" # teal
            
    radius= row['Breakeven_years']
    folium.CircleMarker(location=(row["Latitude"],row["Longitude"]),radius=-radius+35,color=color,popup=popup_text,
                        fill=True).add_to(folium_map_Breakeven)
folium_map_Breakeven
Out[36]:

Competitive Analysis

Column metadata
  • Total_listings - Count of apartments in each zipcode
  • Total_super_hosts - Count of super hosts present in zipcode
  • Percentage_of_superhosts - Percentage of superhosts out of total listings in each zipcode
  • Latitude - Average of Latitude values of all listings in a zipcode to get an approximate latitude value for the zipcode
  • Longitude - Average of Longitude values of all listings in a zipcode to get an approximate longitude value for the zipcode

Venturing into a market without understanding the competition is risky. Identifying our competitors is important before we finalize our business strategies. It is vital to the success of a new or existing business because it reduces risk, time, required resources, and expense.

As per Airbnb, a super host is someone who provides a shining example for other hosts, and provide extraordinary experiences for their guests. So, it is important to look where our competition is before we start investing.

Rather than looking at the number of superhosts, an analysis of their proportion out of the total listings would give us a better idea of the competition.

In [37]:
final_org=final.groupby(['Zipcodes','Neighbourhood']).agg({'latitude':'mean','longitude':'mean','host_is_superhost':'count'}).rename(columns={'latitude':'Latitude','longitude':'Longitude'}).reset_index()
final_sh=final[final.host_is_superhost=='t']
final_sh=final_sh.groupby(['Zipcodes','Neighbourhood'])['host_is_superhost'].count().reset_index()
final_competitors = final_org.merge(final_sh, how='inner', left_on='Zipcodes', right_on='Zipcodes')
final_competitors=final_competitors.drop_duplicates()
final_competitors = pd.merge(final_org,final_sh[['Zipcodes','host_is_superhost']],on='Zipcodes', how='inner')
final_competitors=final_competitors.rename(columns={'host_is_superhost_x':'Total_listings','host_is_superhost_y':'Total_super_hosts'})
final_competitors['Percentage_of_super_hosts']= round(final_competitors.Total_super_hosts/final_competitors.Total_listings * 100,2)
In [38]:
folium_map_competitors = folium.Map(location=[40.67, -74],
                        zoom_start=11,
                        tiles="CartoDB dark_matter")
for index,row in final_competitors.iterrows():
    popup_text = "Zipcode: {}<br> Neighbourhood: {}<br> Listings: {}<br> Superhosts: {}<br> Superhosts %: {}"
    popup_text = popup_text.format(row["Zipcodes"],row["Neighbourhood"],row["Total_listings"],row["Total_super_hosts"],row["Percentage_of_super_hosts"])
    if row["Neighbourhood"]=='Manhattan':
        color="#FFCE00" # orange
    elif row["Neighbourhood"]=='Queens':
        color="#E37222"
    elif row["Neighbourhood"]=='Staten Island':
        color="#0375B4" # blue
    else :
        color="#0A8A9F" # teal
            
    radius= row['Percentage_of_super_hosts']
    folium.CircleMarker(location=(row["Latitude"],row["Longitude"]),radius=radius/2,color=color,popup=popup_text,
                        fill=True).add_to(folium_map_competitors)

folium_map_competitors
Out[38]:
Observation
  • The size of the bubbles represents the percentage of super hosts out of the total listings. We may choose to ignore the zipcode 10308 in Staten Island as this has only 2 listings. However, it can be observed that in Queens, out of the 16 listings 8 of them (50%) are superhosts making this neighbourhood highly competitive for someone who is planning to invest in this area.

Approach

A thorough study of the data from the different sources was performed followed by data munging and cleaning steps

A basic exploratory data analysis was performed on the cleaned dataset to derive insights

Created different plots to understand the price and cost distribution

Performed a Price vs Cost Quadrant Analysis to understand the high-risk and high-return ZipCodes

Performed Breakeven Analysis to understand the ZipCodes which recoup the investment the earliest

Performed a Competitve Analysis to understand the competiton and identify highly competitive Zipcodes before investing

Price Prediction - LR | Random Forests | XGBoost | Neural Nets

Data Preprocessing

In [39]:
#We will use the final_data for the analysis
final_data.head()
Out[39]:
RegionName City State Metro CountyName SizeRank 2016-07 2016-08 2016-09 2016-10 ... instant_bookable is_business_travel_ready cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count calculated_host_listings_count_entire_homes calculated_host_listings_count_private_rooms calculated_host_listings_count_shared_rooms reviews_per_month
0 10025 New York NY New York New York 1 1373300 1382600 1374400 1364100 ... t f strict_14_with_grace_period f f 1 1 0 0 0.08
1 10025 New York NY New York New York 1 1373300 1382600 1374400 1364100 ... f f strict_14_with_grace_period f f 1 1 0 0 2.18
2 10025 New York NY New York New York 1 1373300 1382600 1374400 1364100 ... t f strict_14_with_grace_period f f 1 1 0 0 1.94
3 10025 New York NY New York New York 1 1373300 1382600 1374400 1364100 ... f f strict_14_with_grace_period t t 11 11 0 0 0.33
4 10025 New York NY New York New York 1 1373300 1382600 1374400 1364100 ... f f strict_14_with_grace_period t t 3 1 2 0 0.47

5 rows × 124 columns

In [40]:
final_data.columns
Out[40]:
Index(['RegionName', 'City', 'State', 'Metro', 'CountyName', 'SizeRank',
       '2016-07', '2016-08', '2016-09', '2016-10',
       ...
       'instant_bookable', 'is_business_travel_ready', 'cancellation_policy',
       'require_guest_profile_picture', 'require_guest_phone_verification',
       'calculated_host_listings_count',
       'calculated_host_listings_count_entire_homes',
       'calculated_host_listings_count_private_rooms',
       'calculated_host_listings_count_shared_rooms', 'reviews_per_month'],
      dtype='object', length=124)
In [41]:
# subsetting the data to include only useful columns 
model_data=final_data[['price','security_deposit','cleaning_fee','extra_people','host_response_rate','host_is_superhost','host_listings_count','host_has_profile_pic','host_identity_verified','neighbourhood_group_cleansed','city','state','property_type','room_type','accommodates','bathrooms','bedrooms','beds','guests_included','number_of_reviews','review_scores_value','instant_bookable','is_business_travel_ready','reviews_per_month']]
In [42]:
# investigating the column types
model_data.dtypes
Out[42]:
price                            object
security_deposit                 object
cleaning_fee                     object
extra_people                     object
host_response_rate               object
host_is_superhost                object
host_listings_count             float64
host_has_profile_pic             object
host_identity_verified           object
neighbourhood_group_cleansed     object
city                             object
state                            object
property_type                    object
room_type                        object
accommodates                      int64
bathrooms                       float64
bedrooms                        float64
beds                            float64
guests_included                   int64
number_of_reviews                 int64
review_scores_value             float64
instant_bookable                 object
is_business_travel_ready         object
reviews_per_month               float64
dtype: object

It is observed that the price columns need to be converted as before. Also few other columns would need some transformation to make them fit for the analysis

In [43]:
missing_data_list(model_data)
Out[43]:
security_deposit       26.85
host_response_rate     30.75
review_scores_value    24.81
reviews_per_month      23.40
dtype: float64
In [44]:
model_data.host_response_rate = model_data.host_response_rate.str[:-1].astype('float64')
In [45]:
#Analyzing the host response rate column to bin and understand how the data is spread
print("Median host response rate:", model_data['host_response_rate'].median())
print(f"Proportion of 100% host response rates: {round(((model_data.host_response_rate == 100.0).sum()/model_data.host_response_rate.count())*100,1)}%")
Median host response rate: 100.0
Proportion of 100% host response rates: 65.4%
In [46]:
# Bin into four categories
model_data.host_response_rate = pd.cut(model_data.host_response_rate, bins=[0, 50, 90, 99, 100], labels=['0-49%', '50-89%', '90-99%', '100%'], include_lowest=True)
# Converting to string to make it categories
model_data.host_response_rate = model_data.host_response_rate.astype('str')
model_data.host_response_rate.replace('nan', 'unknown', inplace=True)
In [47]:
#Check nulls in host related columns
len(model_data[model_data.loc[ :,['host_is_superhost', 'host_listings_count', 'host_has_profile_pic', 'host_identity_verified'] ].isnull().sum(axis=1) == 5])
Out[47]:
0
In [48]:
#The property type can be grouped to reduce the no of levels
model_data.property_type.value_counts()
Out[48]:
Apartment             1289
Condominium             68
House                   57
Loft                    55
Townhouse               44
Serviced apartment      35
Guest suite              8
Other                    3
Villa                    1
Boutique hotel           1
Bungalow                 1
Guesthouse               1
Yurt                     1
Name: property_type, dtype: int64
In [49]:
# Replacing categories that are types of houses or apartments
model_data.property_type.replace({
    'Townhouse': 'House',
    'Serviced apartment': 'Apartment',
    'Loft': 'Apartment',
    'Bungalow': 'House',
    'Cottage': 'House',
    'Villa': 'House',
    'Tiny house': 'House',
    'Earth house': 'House',
    'Chalet': 'House'  
    }, inplace=True)
model_data.loc[~model_data.property_type.isin(['House', 'Apartment']), 'property_type'] = 'Other'
In [50]:
#imputing with median for null values in columns such as bathrooms, bedrooms and beds
for col in ['bathrooms', 'bedrooms', 'beds']:
    model_data[col].fillna(model_data[col].median(), inplace=True)
In [51]:
#cleaning price columns
model_data[['security_deposit','price','cleaning_fee','extra_people']]=model_data[['security_deposit','price','cleaning_fee','extra_people']].apply(cleanprices)
In [52]:
model_data[['security_deposit']].dtypes
Out[52]:
security_deposit    float64
dtype: object
In [53]:
#cleaning up columns - imputing 0 
model_data.security_deposit.fillna(0, inplace=True)
model_data.cleaning_fee.fillna(0, inplace=True)
model_data.extra_people.fillna(0, inplace=True)
In [54]:
model_data.isna().sum()
Out[54]:
price                             0
security_deposit                  0
cleaning_fee                      0
extra_people                      0
host_response_rate                0
host_is_superhost                 1
host_listings_count               1
host_has_profile_pic              1
host_identity_verified            1
neighbourhood_group_cleansed      0
city                              3
state                             1
property_type                     0
room_type                         0
accommodates                      0
bathrooms                         0
bedrooms                          0
beds                              0
guests_included                   0
number_of_reviews                 0
review_scores_value             388
instant_bookable                  0
is_business_travel_ready          0
reviews_per_month               366
dtype: int64
In [55]:
model_data.dropna(subset = ["host_is_superhost"], inplace=True)
model_data.isna().sum()
Out[55]:
price                             0
security_deposit                  0
cleaning_fee                      0
extra_people                      0
host_response_rate                0
host_is_superhost                 0
host_listings_count               0
host_has_profile_pic              0
host_identity_verified            0
neighbourhood_group_cleansed      0
city                              3
state                             1
property_type                     0
room_type                         0
accommodates                      0
bathrooms                         0
bedrooms                          0
beds                              0
guests_included                   0
number_of_reviews                 0
review_scores_value             387
instant_bookable                  0
is_business_travel_ready          0
reviews_per_month               365
dtype: int64
In [56]:
#Function to bin columns
def binning(col, bins, labels, na_label='unknown'):
    model_data[col] = pd.cut(model_data[col], bins=bins, labels=labels, include_lowest=True)
    model_data[col] = model_data[col].astype('str')
    model_data[col].fillna(na_label, inplace=True)

binning('review_scores_value',
           bins=[0, 8, 9, 10],
           labels=['0-8/10', '9/10', '10/10'],
           na_label='no reviews')
In [57]:
model_data.drop(['reviews_per_month','city','state'], axis=1, inplace=True)
In [58]:
model_data.isna().sum()
Out[58]:
price                           0
security_deposit                0
cleaning_fee                    0
extra_people                    0
host_response_rate              0
host_is_superhost               0
host_listings_count             0
host_has_profile_pic            0
host_identity_verified          0
neighbourhood_group_cleansed    0
property_type                   0
room_type                       0
accommodates                    0
bathrooms                       0
bedrooms                        0
beds                            0
guests_included                 0
number_of_reviews               0
review_scores_value             0
instant_bookable                0
is_business_travel_ready        0
dtype: int64
In [59]:
to_drop = ['bedrooms','neighbourhood_group_cleansed']
model_data.drop(to_drop, axis=1, inplace=True)
transformed_data = pd.get_dummies(model_data,drop_first=True)
In [60]:
transformed_data.head()
Out[60]:
price security_deposit cleaning_fee extra_people host_listings_count accommodates bathrooms beds guests_included number_of_reviews ... host_is_superhost_t host_has_profile_pic_t host_identity_verified_t property_type_House property_type_Other room_type_Private room review_scores_value_10/10 review_scores_value_9/10 review_scores_value_nan instant_bookable_t
0 95.0 350.0 100.0 0.0 1.0 3 1.0 2.0 1 2 ... 0 1 1 0 0 0 0 0 0 1
1 165.0 200.0 85.0 50.0 4.0 4 1.0 2.0 2 191 ... 0 1 1 0 0 0 0 1 0 0
2 250.0 250.0 100.0 0.0 2.0 7 1.0 3.0 1 188 ... 0 1 0 0 0 0 0 1 0 1
3 2000.0 0.0 0.0 0.0 14.0 9 2.5 3.0 4 30 ... 0 1 1 0 1 0 0 1 0 0
4 300.0 0.0 100.0 10.0 3.0 6 2.0 4.0 6 45 ... 1 1 0 0 0 0 0 1 0 0

5 rows × 24 columns

In [61]:
def mc_heatmap(df, figsize=(20,20)):
    sns.set(style="whitegrid")
    corr = df.corr() #co-variance matrix
    # Generate a mask the size of our covariance matrix
    mask = np.zeros_like(corr, dtype=np.bool)
    mask[np.triu_indices_from(mask)] = True
    # Set up the matplotlib figure
    f, ax = plt.subplots(figsize=figsize)
    # Generate a custom diverging colormap
    cmap = sns.diverging_palette(220, 10, as_cmap=True)
    # Draw the heatmap with the mask and correct aspect ratio
    sns.heatmap(corr, mask=mask, cmap=cmap, center=0, square=True, linewidths=.5, cbar_kws={"shrink": .5}, vmax=corr[corr != 1.0].max().max());
In [62]:
mc_heatmap(transformed_data)
In [63]:
#log transforming the numerical columns as they found to be skewed
numerical_columns = ['accommodates', 'bathrooms', 'cleaning_fee', 'extra_people', 'host_listings_count', 'number_of_reviews', 'price', 'security_deposit','beds','guests_included']
for col in numerical_columns:
    transformed_data[col] = transformed_data[col].astype('float64').replace(0.0, 0.01) # Replacing 0s with 0.01
    transformed_data[col] = np.log(transformed_data[col])
In [64]:
#Defining predictors and response variables
X = transformed_data.drop('price', axis=1)
y = transformed_data.price
# Scaling the predictors using StandardScaler
scaler = StandardScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns=list(X.columns))
In [65]:
#Checking for multicollinearity using Variance Inflation Factor (values>10 should be checked for mc)
vif = pd.DataFrame()
vif["VIF Factor"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif["features"] = X.columns
vif.round(1)
Out[65]:
VIF Factor features
0 1.2 security_deposit
1 1.3 cleaning_fee
2 2.5 extra_people
3 1.5 host_listings_count
4 1.6 accommodates
5 1.1 bathrooms
6 1.2 beds
7 2.4 guests_included
8 7.3 number_of_reviews
9 8.5 host_response_rate_100%
10 3.9 host_response_rate_50-89%
11 4.2 host_response_rate_90-99%
12 7.4 host_response_rate_unknown
13 1.3 host_is_superhost_t
14 1.0 host_has_profile_pic_t
15 1.1 host_identity_verified_t
16 1.1 property_type_House
17 1.1 property_type_Other
18 1.3 room_type_Private room
19 3.5 review_scores_value_10/10
20 3.3 review_scores_value_9/10
21 6.8 review_scores_value_nan
22 1.2 instant_bookable_t
In [74]:
# Splitting into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=20)

Linear Model

In [75]:
Lmodel=LinearRegression(fit_intercept=True).fit(X_train,y_train)
Lmodel.coef_,Lmodel.intercept_
predicted_values=Lmodel.predict(X_test)
mean_squared_error(y_test,predicted_values)


training_regr = Lmodel.predict(X_train)
test_regr = Lmodel.predict(X_test)
print("\nTraining MSE:", round(mean_squared_error(y_train, training_regr),4))
print("Validation MSE:", round(mean_squared_error(y_test, test_regr),4))
Training MSE: 0.1993
Validation MSE: 0.2425

Random Forests

In [76]:
RF = RandomForestRegressor(n_estimators = 60)
RF.fit(X_train,y_train)
training_rfreg = RF.predict(X_train)
val_preds_rfreg = RF.predict(X_test)
In [77]:
print("\nTraining MSE:", round(mean_squared_error(y_train, training_rfreg),4))
print("Validation MSE:", round(mean_squared_error(y_test, val_preds_rfreg),4))
Training MSE: 0.0274
Validation MSE: 0.2243

XGBoost

In [78]:
xgb_regr = xgb.XGBRegressor()
xgb_regr.fit(X_train, y_train)

xgb_reg_train = xgb_regr.predict(X_train) #on train set
# Validate
xgb_reg_test = xgb_regr.predict(X_test) # on test set

print("\nTraining MSE:", round(mean_squared_error(y_train, xgb_reg_train),4))
print("Validation MSE:", round(mean_squared_error(y_test, xgb_reg_test),4))
Training MSE: 0.0154
Validation MSE: 0.2309

Neural Networks

In [79]:
#2 hidden layers
nnet_2 = models.Sequential()
nnet_2.add(layers.Dense(128, input_shape=(X_train.shape[1],), activation='relu'))
nnet_2.add(layers.Dense(256, activation='relu'))
nnet_2.add(layers.Dense(256, activation='relu'))
nnet_2.add(layers.Dense(1, activation='linear'))  #linear activation since its regression 

# Compiling the model and summary
nnet_2.compile(loss='mean_squared_error',optimizer='adam',metrics=['mean_squared_error'])
print(nnet_2.summary())
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_5 (Dense)              (None, 128)               3072      
_________________________________________________________________
dense_6 (Dense)              (None, 256)               33024     
_________________________________________________________________
dense_7 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 257       
=================================================================
Total params: 102,145
Trainable params: 102,145
Non-trainable params: 0
_________________________________________________________________
None
In [80]:
# Model Training and validating on 10%
nnet2_train = nnet_2.fit(X_train,
                  y_train,
                  epochs=150,
                  batch_size=256,
                  validation_split = 0.1)
Train on 984 samples, validate on 110 samples
Epoch 1/150
984/984 [==============================] - 0s 109us/step - loss: 23.3521 - mean_squared_error: 23.3521 - val_loss: 11.8711 - val_mean_squared_error: 11.8711
Epoch 2/150
984/984 [==============================] - 0s 13us/step - loss: 7.5773 - mean_squared_error: 7.5773 - val_loss: 1.5921 - val_mean_squared_error: 1.5921
Epoch 3/150
984/984 [==============================] - 0s 11us/step - loss: 4.3512 - mean_squared_error: 4.3512 - val_loss: 3.3289 - val_mean_squared_error: 3.3289
Epoch 4/150
984/984 [==============================] - 0s 13us/step - loss: 3.8168 - mean_squared_error: 3.8168 - val_loss: 0.7861 - val_mean_squared_error: 0.7861
Epoch 5/150
984/984 [==============================] - 0s 13us/step - loss: 1.5704 - mean_squared_error: 1.5704 - val_loss: 1.3236 - val_mean_squared_error: 1.3236
Epoch 6/150
984/984 [==============================] - 0s 13us/step - loss: 1.8392 - mean_squared_error: 1.8392 - val_loss: 1.2823 - val_mean_squared_error: 1.2823
Epoch 7/150
984/984 [==============================] - 0s 12us/step - loss: 1.3170 - mean_squared_error: 1.3170 - val_loss: 0.4969 - val_mean_squared_error: 0.4969
Epoch 8/150
984/984 [==============================] - 0s 12us/step - loss: 0.7886 - mean_squared_error: 0.7886 - val_loss: 0.5784 - val_mean_squared_error: 0.5784
Epoch 9/150
984/984 [==============================] - 0s 12us/step - loss: 0.8694 - mean_squared_error: 0.8694 - val_loss: 0.5264 - val_mean_squared_error: 0.5264
Epoch 10/150
984/984 [==============================] - 0s 11us/step - loss: 0.6298 - mean_squared_error: 0.6298 - val_loss: 0.3518 - val_mean_squared_error: 0.3518
Epoch 11/150
984/984 [==============================] - 0s 11us/step - loss: 0.5570 - mean_squared_error: 0.5570 - val_loss: 0.4249 - val_mean_squared_error: 0.4249
Epoch 12/150
984/984 [==============================] - 0s 11us/step - loss: 0.5455 - mean_squared_error: 0.5455 - val_loss: 0.3447 - val_mean_squared_error: 0.3447
Epoch 13/150
984/984 [==============================] - 0s 12us/step - loss: 0.4388 - mean_squared_error: 0.4388 - val_loss: 0.3435 - val_mean_squared_error: 0.3435
Epoch 14/150
984/984 [==============================] - 0s 11us/step - loss: 0.4231 - mean_squared_error: 0.4231 - val_loss: 0.3403 - val_mean_squared_error: 0.3403
Epoch 15/150
984/984 [==============================] - 0s 11us/step - loss: 0.3718 - mean_squared_error: 0.3718 - val_loss: 0.3012 - val_mean_squared_error: 0.3012
Epoch 16/150
984/984 [==============================] - 0s 12us/step - loss: 0.3502 - mean_squared_error: 0.3502 - val_loss: 0.2986 - val_mean_squared_error: 0.2986
Epoch 17/150
984/984 [==============================] - 0s 12us/step - loss: 0.3302 - mean_squared_error: 0.3302 - val_loss: 0.2722 - val_mean_squared_error: 0.2722
Epoch 18/150
984/984 [==============================] - 0s 12us/step - loss: 0.3004 - mean_squared_error: 0.3004 - val_loss: 0.2700 - val_mean_squared_error: 0.2700
Epoch 19/150
984/984 [==============================] - 0s 11us/step - loss: 0.2901 - mean_squared_error: 0.2901 - val_loss: 0.2555 - val_mean_squared_error: 0.2555
Epoch 20/150
984/984 [==============================] - 0s 12us/step - loss: 0.2699 - mean_squared_error: 0.2699 - val_loss: 0.2481 - val_mean_squared_error: 0.2481
Epoch 21/150
984/984 [==============================] - 0s 11us/step - loss: 0.2589 - mean_squared_error: 0.2589 - val_loss: 0.2468 - val_mean_squared_error: 0.2468
Epoch 22/150
984/984 [==============================] - 0s 11us/step - loss: 0.2456 - mean_squared_error: 0.2456 - val_loss: 0.2483 - val_mean_squared_error: 0.2483
Epoch 23/150
984/984 [==============================] - 0s 12us/step - loss: 0.2366 - mean_squared_error: 0.2366 - val_loss: 0.2485 - val_mean_squared_error: 0.2485
Epoch 24/150
984/984 [==============================] - 0s 12us/step - loss: 0.2264 - mean_squared_error: 0.2264 - val_loss: 0.2404 - val_mean_squared_error: 0.2404
Epoch 25/150
984/984 [==============================] - 0s 12us/step - loss: 0.2194 - mean_squared_error: 0.2194 - val_loss: 0.2357 - val_mean_squared_error: 0.2357
Epoch 26/150
984/984 [==============================] - 0s 11us/step - loss: 0.2100 - mean_squared_error: 0.2100 - val_loss: 0.2356 - val_mean_squared_error: 0.2356
Epoch 27/150
984/984 [==============================] - 0s 11us/step - loss: 0.2034 - mean_squared_error: 0.2034 - val_loss: 0.2336 - val_mean_squared_error: 0.2336
Epoch 28/150
984/984 [==============================] - 0s 10us/step - loss: 0.1967 - mean_squared_error: 0.1967 - val_loss: 0.2324 - val_mean_squared_error: 0.2324
Epoch 29/150
984/984 [==============================] - 0s 12us/step - loss: 0.1913 - mean_squared_error: 0.1913 - val_loss: 0.2297 - val_mean_squared_error: 0.2297
Epoch 30/150
984/984 [==============================] - 0s 10us/step - loss: 0.1854 - mean_squared_error: 0.1854 - val_loss: 0.2318 - val_mean_squared_error: 0.2318
Epoch 31/150
984/984 [==============================] - 0s 11us/step - loss: 0.1808 - mean_squared_error: 0.1808 - val_loss: 0.2287 - val_mean_squared_error: 0.2287
Epoch 32/150
984/984 [==============================] - 0s 11us/step - loss: 0.1756 - mean_squared_error: 0.1756 - val_loss: 0.2286 - val_mean_squared_error: 0.2286
Epoch 33/150
984/984 [==============================] - 0s 11us/step - loss: 0.1710 - mean_squared_error: 0.1710 - val_loss: 0.2270 - val_mean_squared_error: 0.2270
Epoch 34/150
984/984 [==============================] - 0s 10us/step - loss: 0.1668 - mean_squared_error: 0.1668 - val_loss: 0.2281 - val_mean_squared_error: 0.2281
Epoch 35/150
984/984 [==============================] - 0s 11us/step - loss: 0.1630 - mean_squared_error: 0.1630 - val_loss: 0.2282 - val_mean_squared_error: 0.2282
Epoch 36/150
984/984 [==============================] - 0s 11us/step - loss: 0.1593 - mean_squared_error: 0.1593 - val_loss: 0.2263 - val_mean_squared_error: 0.2263
Epoch 37/150
984/984 [==============================] - 0s 11us/step - loss: 0.1560 - mean_squared_error: 0.1560 - val_loss: 0.2301 - val_mean_squared_error: 0.2301
Epoch 38/150
984/984 [==============================] - 0s 12us/step - loss: 0.1523 - mean_squared_error: 0.1523 - val_loss: 0.2308 - val_mean_squared_error: 0.2308
Epoch 39/150
984/984 [==============================] - 0s 13us/step - loss: 0.1488 - mean_squared_error: 0.1488 - val_loss: 0.2309 - val_mean_squared_error: 0.2309
Epoch 40/150
984/984 [==============================] - 0s 13us/step - loss: 0.1458 - mean_squared_error: 0.1458 - val_loss: 0.2287 - val_mean_squared_error: 0.2287
Epoch 41/150
984/984 [==============================] - 0s 11us/step - loss: 0.1424 - mean_squared_error: 0.1424 - val_loss: 0.2298 - val_mean_squared_error: 0.2298
Epoch 42/150
984/984 [==============================] - 0s 10us/step - loss: 0.1398 - mean_squared_error: 0.1398 - val_loss: 0.2273 - val_mean_squared_error: 0.2273
Epoch 43/150
984/984 [==============================] - 0s 11us/step - loss: 0.1376 - mean_squared_error: 0.1376 - val_loss: 0.2280 - val_mean_squared_error: 0.2280
Epoch 44/150
984/984 [==============================] - 0s 11us/step - loss: 0.1344 - mean_squared_error: 0.1344 - val_loss: 0.2313 - val_mean_squared_error: 0.2313
Epoch 45/150
984/984 [==============================] - 0s 11us/step - loss: 0.1325 - mean_squared_error: 0.1325 - val_loss: 0.2277 - val_mean_squared_error: 0.2277
Epoch 46/150
984/984 [==============================] - 0s 11us/step - loss: 0.1296 - mean_squared_error: 0.1296 - val_loss: 0.2310 - val_mean_squared_error: 0.2310
Epoch 47/150
984/984 [==============================] - 0s 11us/step - loss: 0.1278 - mean_squared_error: 0.1278 - val_loss: 0.2315 - val_mean_squared_error: 0.2315
Epoch 48/150
984/984 [==============================] - 0s 11us/step - loss: 0.1251 - mean_squared_error: 0.1251 - val_loss: 0.2261 - val_mean_squared_error: 0.2261
Epoch 49/150
984/984 [==============================] - 0s 10us/step - loss: 0.1235 - mean_squared_error: 0.1235 - val_loss: 0.2287 - val_mean_squared_error: 0.2287
Epoch 50/150
984/984 [==============================] - 0s 11us/step - loss: 0.1211 - mean_squared_error: 0.1211 - val_loss: 0.2276 - val_mean_squared_error: 0.2276
Epoch 51/150
984/984 [==============================] - 0s 11us/step - loss: 0.1190 - mean_squared_error: 0.1190 - val_loss: 0.2304 - val_mean_squared_error: 0.2304
Epoch 52/150
984/984 [==============================] - 0s 11us/step - loss: 0.1169 - mean_squared_error: 0.1169 - val_loss: 0.2292 - val_mean_squared_error: 0.2292
Epoch 53/150
984/984 [==============================] - 0s 10us/step - loss: 0.1153 - mean_squared_error: 0.1153 - val_loss: 0.2287 - val_mean_squared_error: 0.2287
Epoch 54/150
984/984 [==============================] - 0s 11us/step - loss: 0.1134 - mean_squared_error: 0.1134 - val_loss: 0.2300 - val_mean_squared_error: 0.2300
Epoch 55/150
984/984 [==============================] - 0s 11us/step - loss: 0.1118 - mean_squared_error: 0.1118 - val_loss: 0.2255 - val_mean_squared_error: 0.2255
Epoch 56/150
984/984 [==============================] - 0s 11us/step - loss: 0.1100 - mean_squared_error: 0.1100 - val_loss: 0.2280 - val_mean_squared_error: 0.2280
Epoch 57/150
984/984 [==============================] - 0s 10us/step - loss: 0.1083 - mean_squared_error: 0.1083 - val_loss: 0.2268 - val_mean_squared_error: 0.2268
Epoch 58/150
984/984 [==============================] - 0s 10us/step - loss: 0.1068 - mean_squared_error: 0.1068 - val_loss: 0.2283 - val_mean_squared_error: 0.2283
Epoch 59/150
984/984 [==============================] - 0s 11us/step - loss: 0.1052 - mean_squared_error: 0.1052 - val_loss: 0.2302 - val_mean_squared_error: 0.2302
Epoch 60/150
984/984 [==============================] - 0s 12us/step - loss: 0.1031 - mean_squared_error: 0.1031 - val_loss: 0.2270 - val_mean_squared_error: 0.2270
Epoch 61/150
984/984 [==============================] - 0s 10us/step - loss: 0.1013 - mean_squared_error: 0.1013 - val_loss: 0.2295 - val_mean_squared_error: 0.2295
Epoch 62/150
984/984 [==============================] - 0s 10us/step - loss: 0.0999 - mean_squared_error: 0.0999 - val_loss: 0.2302 - val_mean_squared_error: 0.2302
Epoch 63/150
984/984 [==============================] - 0s 11us/step - loss: 0.0983 - mean_squared_error: 0.0983 - val_loss: 0.2277 - val_mean_squared_error: 0.2277
Epoch 64/150
984/984 [==============================] - 0s 10us/step - loss: 0.0969 - mean_squared_error: 0.0969 - val_loss: 0.2292 - val_mean_squared_error: 0.2292
Epoch 65/150
984/984 [==============================] - 0s 10us/step - loss: 0.0952 - mean_squared_error: 0.0952 - val_loss: 0.2312 - val_mean_squared_error: 0.2312
Epoch 66/150
984/984 [==============================] - 0s 10us/step - loss: 0.0938 - mean_squared_error: 0.0938 - val_loss: 0.2300 - val_mean_squared_error: 0.2300
Epoch 67/150
984/984 [==============================] - 0s 11us/step - loss: 0.0924 - mean_squared_error: 0.0924 - val_loss: 0.2297 - val_mean_squared_error: 0.2297
Epoch 68/150
984/984 [==============================] - 0s 12us/step - loss: 0.0911 - mean_squared_error: 0.0911 - val_loss: 0.2303 - val_mean_squared_error: 0.2303
Epoch 69/150
984/984 [==============================] - 0s 11us/step - loss: 0.0898 - mean_squared_error: 0.0898 - val_loss: 0.2306 - val_mean_squared_error: 0.2306
Epoch 70/150
984/984 [==============================] - 0s 10us/step - loss: 0.0887 - mean_squared_error: 0.0887 - val_loss: 0.2301 - val_mean_squared_error: 0.2301
Epoch 71/150
984/984 [==============================] - 0s 11us/step - loss: 0.0883 - mean_squared_error: 0.0883 - val_loss: 0.2340 - val_mean_squared_error: 0.2340
Epoch 72/150
984/984 [==============================] - 0s 10us/step - loss: 0.0853 - mean_squared_error: 0.0853 - val_loss: 0.2292 - val_mean_squared_error: 0.2292
Epoch 73/150
984/984 [==============================] - 0s 10us/step - loss: 0.0850 - mean_squared_error: 0.0850 - val_loss: 0.2318 - val_mean_squared_error: 0.2318
Epoch 74/150
984/984 [==============================] - 0s 11us/step - loss: 0.0834 - mean_squared_error: 0.0834 - val_loss: 0.2319 - val_mean_squared_error: 0.2319
Epoch 75/150
984/984 [==============================] - 0s 11us/step - loss: 0.0824 - mean_squared_error: 0.0824 - val_loss: 0.2339 - val_mean_squared_error: 0.2339
Epoch 76/150
984/984 [==============================] - 0s 11us/step - loss: 0.0812 - mean_squared_error: 0.0812 - val_loss: 0.2319 - val_mean_squared_error: 0.2319
Epoch 77/150
984/984 [==============================] - 0s 10us/step - loss: 0.0799 - mean_squared_error: 0.0799 - val_loss: 0.2282 - val_mean_squared_error: 0.2282
Epoch 78/150
984/984 [==============================] - 0s 11us/step - loss: 0.0785 - mean_squared_error: 0.0785 - val_loss: 0.2352 - val_mean_squared_error: 0.2352
Epoch 79/150
984/984 [==============================] - 0s 11us/step - loss: 0.0769 - mean_squared_error: 0.0769 - val_loss: 0.2330 - val_mean_squared_error: 0.2330
Epoch 80/150
984/984 [==============================] - 0s 10us/step - loss: 0.0764 - mean_squared_error: 0.0764 - val_loss: 0.2358 - val_mean_squared_error: 0.2358
Epoch 81/150
984/984 [==============================] - 0s 10us/step - loss: 0.0761 - mean_squared_error: 0.0761 - val_loss: 0.2322 - val_mean_squared_error: 0.2322
Epoch 82/150
984/984 [==============================] - 0s 11us/step - loss: 0.0747 - mean_squared_error: 0.0747 - val_loss: 0.2382 - val_mean_squared_error: 0.2382
Epoch 83/150
984/984 [==============================] - 0s 11us/step - loss: 0.0733 - mean_squared_error: 0.0733 - val_loss: 0.2322 - val_mean_squared_error: 0.2322
Epoch 84/150
984/984 [==============================] - 0s 11us/step - loss: 0.0727 - mean_squared_error: 0.0727 - val_loss: 0.2365 - val_mean_squared_error: 0.2365
Epoch 85/150
984/984 [==============================] - 0s 10us/step - loss: 0.0717 - mean_squared_error: 0.0717 - val_loss: 0.2354 - val_mean_squared_error: 0.2354
Epoch 86/150
984/984 [==============================] - 0s 11us/step - loss: 0.0703 - mean_squared_error: 0.0703 - val_loss: 0.2327 - val_mean_squared_error: 0.2327
Epoch 87/150
984/984 [==============================] - 0s 12us/step - loss: 0.0690 - mean_squared_error: 0.0690 - val_loss: 0.2355 - val_mean_squared_error: 0.2355
Epoch 88/150
984/984 [==============================] - 0s 12us/step - loss: 0.0680 - mean_squared_error: 0.0680 - val_loss: 0.2378 - val_mean_squared_error: 0.2378
Epoch 89/150
984/984 [==============================] - 0s 11us/step - loss: 0.0672 - mean_squared_error: 0.0672 - val_loss: 0.2357 - val_mean_squared_error: 0.2357
Epoch 90/150
984/984 [==============================] - 0s 11us/step - loss: 0.0661 - mean_squared_error: 0.0661 - val_loss: 0.2350 - val_mean_squared_error: 0.2350
Epoch 91/150
984/984 [==============================] - 0s 11us/step - loss: 0.0658 - mean_squared_error: 0.0658 - val_loss: 0.2392 - val_mean_squared_error: 0.2392
Epoch 92/150
984/984 [==============================] - 0s 10us/step - loss: 0.0650 - mean_squared_error: 0.0650 - val_loss: 0.2380 - val_mean_squared_error: 0.2380
Epoch 93/150
984/984 [==============================] - 0s 10us/step - loss: 0.0636 - mean_squared_error: 0.0636 - val_loss: 0.2394 - val_mean_squared_error: 0.2394
Epoch 94/150
984/984 [==============================] - 0s 11us/step - loss: 0.0626 - mean_squared_error: 0.0626 - val_loss: 0.2414 - val_mean_squared_error: 0.2414
Epoch 95/150
984/984 [==============================] - 0s 11us/step - loss: 0.0618 - mean_squared_error: 0.0618 - val_loss: 0.2378 - val_mean_squared_error: 0.2378
Epoch 96/150
984/984 [==============================] - 0s 11us/step - loss: 0.0611 - mean_squared_error: 0.0611 - val_loss: 0.2362 - val_mean_squared_error: 0.2362
Epoch 97/150
984/984 [==============================] - 0s 10us/step - loss: 0.0602 - mean_squared_error: 0.0602 - val_loss: 0.2398 - val_mean_squared_error: 0.2398
Epoch 98/150
984/984 [==============================] - 0s 11us/step - loss: 0.0594 - mean_squared_error: 0.0594 - val_loss: 0.2370 - val_mean_squared_error: 0.2370
Epoch 99/150
984/984 [==============================] - 0s 10us/step - loss: 0.0589 - mean_squared_error: 0.0589 - val_loss: 0.2385 - val_mean_squared_error: 0.2385
Epoch 100/150
984/984 [==============================] - 0s 11us/step - loss: 0.0577 - mean_squared_error: 0.0577 - val_loss: 0.2410 - val_mean_squared_error: 0.2410
Epoch 101/150
984/984 [==============================] - 0s 10us/step - loss: 0.0571 - mean_squared_error: 0.0571 - val_loss: 0.2383 - val_mean_squared_error: 0.2383
Epoch 102/150
984/984 [==============================] - 0s 11us/step - loss: 0.0566 - mean_squared_error: 0.0566 - val_loss: 0.2417 - val_mean_squared_error: 0.2417
Epoch 103/150
984/984 [==============================] - 0s 10us/step - loss: 0.0557 - mean_squared_error: 0.0557 - val_loss: 0.2412 - val_mean_squared_error: 0.2412
Epoch 104/150
984/984 [==============================] - 0s 11us/step - loss: 0.0550 - mean_squared_error: 0.0550 - val_loss: 0.2404 - val_mean_squared_error: 0.2404
Epoch 105/150
984/984 [==============================] - 0s 10us/step - loss: 0.0544 - mean_squared_error: 0.0544 - val_loss: 0.2447 - val_mean_squared_error: 0.2447
Epoch 106/150
984/984 [==============================] - 0s 12us/step - loss: 0.0536 - mean_squared_error: 0.0536 - val_loss: 0.2408 - val_mean_squared_error: 0.2408
Epoch 107/150
984/984 [==============================] - 0s 11us/step - loss: 0.0529 - mean_squared_error: 0.0529 - val_loss: 0.2419 - val_mean_squared_error: 0.2419
Epoch 108/150
984/984 [==============================] - 0s 10us/step - loss: 0.0522 - mean_squared_error: 0.0522 - val_loss: 0.2440 - val_mean_squared_error: 0.2440
Epoch 109/150
984/984 [==============================] - 0s 10us/step - loss: 0.0515 - mean_squared_error: 0.0515 - val_loss: 0.2451 - val_mean_squared_error: 0.2451
Epoch 110/150
984/984 [==============================] - 0s 11us/step - loss: 0.0509 - mean_squared_error: 0.0509 - val_loss: 0.2419 - val_mean_squared_error: 0.2419
Epoch 111/150
984/984 [==============================] - 0s 11us/step - loss: 0.0500 - mean_squared_error: 0.0500 - val_loss: 0.2463 - val_mean_squared_error: 0.2463
Epoch 112/150
984/984 [==============================] - 0s 11us/step - loss: 0.0502 - mean_squared_error: 0.0502 - val_loss: 0.2431 - val_mean_squared_error: 0.2431
Epoch 113/150
984/984 [==============================] - 0s 10us/step - loss: 0.0490 - mean_squared_error: 0.0490 - val_loss: 0.2454 - val_mean_squared_error: 0.2454
Epoch 114/150
984/984 [==============================] - 0s 11us/step - loss: 0.0484 - mean_squared_error: 0.0484 - val_loss: 0.2440 - val_mean_squared_error: 0.2440
Epoch 115/150
984/984 [==============================] - 0s 10us/step - loss: 0.0477 - mean_squared_error: 0.0477 - val_loss: 0.2473 - val_mean_squared_error: 0.2473
Epoch 116/150
984/984 [==============================] - 0s 10us/step - loss: 0.0471 - mean_squared_error: 0.0471 - val_loss: 0.2457 - val_mean_squared_error: 0.2457
Epoch 117/150
984/984 [==============================] - 0s 11us/step - loss: 0.0467 - mean_squared_error: 0.0467 - val_loss: 0.2466 - val_mean_squared_error: 0.2466
Epoch 118/150
984/984 [==============================] - 0s 11us/step - loss: 0.0457 - mean_squared_error: 0.0457 - val_loss: 0.2456 - val_mean_squared_error: 0.2456
Epoch 119/150
984/984 [==============================] - 0s 10us/step - loss: 0.0455 - mean_squared_error: 0.0455 - val_loss: 0.2491 - val_mean_squared_error: 0.2491
Epoch 120/150
984/984 [==============================] - 0s 11us/step - loss: 0.0451 - mean_squared_error: 0.0451 - val_loss: 0.2456 - val_mean_squared_error: 0.2456
Epoch 121/150
984/984 [==============================] - 0s 12us/step - loss: 0.0442 - mean_squared_error: 0.0442 - val_loss: 0.2512 - val_mean_squared_error: 0.2512
Epoch 122/150
984/984 [==============================] - 0s 13us/step - loss: 0.0444 - mean_squared_error: 0.0444 - val_loss: 0.2450 - val_mean_squared_error: 0.2450
Epoch 123/150
984/984 [==============================] - 0s 12us/step - loss: 0.0438 - mean_squared_error: 0.0438 - val_loss: 0.2530 - val_mean_squared_error: 0.2530
Epoch 124/150
984/984 [==============================] - 0s 12us/step - loss: 0.0436 - mean_squared_error: 0.0436 - val_loss: 0.2514 - val_mean_squared_error: 0.2514
Epoch 125/150
984/984 [==============================] - 0s 11us/step - loss: 0.0424 - mean_squared_error: 0.0424 - val_loss: 0.2520 - val_mean_squared_error: 0.2520
Epoch 126/150
984/984 [==============================] - 0s 10us/step - loss: 0.0420 - mean_squared_error: 0.0420 - val_loss: 0.2457 - val_mean_squared_error: 0.2457
Epoch 127/150
984/984 [==============================] - 0s 18us/step - loss: 0.0418 - mean_squared_error: 0.0418 - val_loss: 0.2548 - val_mean_squared_error: 0.2548
Epoch 128/150
984/984 [==============================] - 0s 11us/step - loss: 0.0408 - mean_squared_error: 0.0408 - val_loss: 0.2503 - val_mean_squared_error: 0.2503
Epoch 129/150
984/984 [==============================] - 0s 11us/step - loss: 0.0403 - mean_squared_error: 0.0403 - val_loss: 0.2524 - val_mean_squared_error: 0.2524
Epoch 130/150
984/984 [==============================] - 0s 11us/step - loss: 0.0402 - mean_squared_error: 0.0402 - val_loss: 0.2528 - val_mean_squared_error: 0.2528
Epoch 131/150
984/984 [==============================] - 0s 10us/step - loss: 0.0396 - mean_squared_error: 0.0396 - val_loss: 0.2530 - val_mean_squared_error: 0.2530
Epoch 132/150
984/984 [==============================] - 0s 11us/step - loss: 0.0391 - mean_squared_error: 0.0391 - val_loss: 0.2554 - val_mean_squared_error: 0.2554
Epoch 133/150
984/984 [==============================] - 0s 10us/step - loss: 0.0388 - mean_squared_error: 0.0388 - val_loss: 0.2454 - val_mean_squared_error: 0.2454
Epoch 134/150
984/984 [==============================] - 0s 12us/step - loss: 0.0384 - mean_squared_error: 0.0384 - val_loss: 0.2632 - val_mean_squared_error: 0.2632
Epoch 135/150
984/984 [==============================] - 0s 11us/step - loss: 0.0388 - mean_squared_error: 0.0388 - val_loss: 0.2519 - val_mean_squared_error: 0.2519
Epoch 136/150
984/984 [==============================] - 0s 10us/step - loss: 0.0377 - mean_squared_error: 0.0377 - val_loss: 0.2522 - val_mean_squared_error: 0.2522
Epoch 137/150
984/984 [==============================] - 0s 11us/step - loss: 0.0377 - mean_squared_error: 0.0377 - val_loss: 0.2558 - val_mean_squared_error: 0.2558
Epoch 138/150
984/984 [==============================] - 0s 11us/step - loss: 0.0376 - mean_squared_error: 0.0376 - val_loss: 0.2535 - val_mean_squared_error: 0.2535
Epoch 139/150
984/984 [==============================] - 0s 11us/step - loss: 0.0359 - mean_squared_error: 0.0359 - val_loss: 0.2563 - val_mean_squared_error: 0.2563
Epoch 140/150
984/984 [==============================] - 0s 10us/step - loss: 0.0359 - mean_squared_error: 0.0359 - val_loss: 0.2554 - val_mean_squared_error: 0.2554
Epoch 141/150
984/984 [==============================] - 0s 12us/step - loss: 0.0353 - mean_squared_error: 0.0353 - val_loss: 0.2491 - val_mean_squared_error: 0.2491
Epoch 142/150
984/984 [==============================] - 0s 11us/step - loss: 0.0352 - mean_squared_error: 0.0352 - val_loss: 0.2550 - val_mean_squared_error: 0.2550
Epoch 143/150
984/984 [==============================] - 0s 11us/step - loss: 0.0346 - mean_squared_error: 0.0346 - val_loss: 0.2570 - val_mean_squared_error: 0.2570
Epoch 144/150
984/984 [==============================] - 0s 11us/step - loss: 0.0339 - mean_squared_error: 0.0339 - val_loss: 0.2555 - val_mean_squared_error: 0.2555
Epoch 145/150
984/984 [==============================] - 0s 11us/step - loss: 0.0340 - mean_squared_error: 0.0340 - val_loss: 0.2594 - val_mean_squared_error: 0.2594
Epoch 146/150
984/984 [==============================] - 0s 11us/step - loss: 0.0331 - mean_squared_error: 0.0331 - val_loss: 0.2527 - val_mean_squared_error: 0.2527
Epoch 147/150
984/984 [==============================] - 0s 11us/step - loss: 0.0332 - mean_squared_error: 0.0332 - val_loss: 0.2566 - val_mean_squared_error: 0.2566
Epoch 148/150
984/984 [==============================] - 0s 10us/step - loss: 0.0324 - mean_squared_error: 0.0324 - val_loss: 0.2598 - val_mean_squared_error: 0.2598
Epoch 149/150
984/984 [==============================] - 0s 11us/step - loss: 0.0322 - mean_squared_error: 0.0322 - val_loss: 0.2571 - val_mean_squared_error: 0.2571
Epoch 150/150
984/984 [==============================] - 0s 11us/step - loss: 0.0317 - mean_squared_error: 0.0317 - val_loss: 0.2570 - val_mean_squared_error: 0.2570
In [81]:
# MSE values
nnet2_test_pred = nnet_2.predict(X_test)
nnet2_train_pred = nnet_2.predict(X_train)
print("Training MSE:", round(mean_squared_error(y_train, nnet2_train_pred),4))
print("Validation MSE:", round(mean_squared_error(y_test, nnet2_test_pred),4))
Training MSE: 0.0535
Validation MSE: 0.5782

Neural Net with 3 hidden layers and other configurations

In [82]:
# neural net with 3 hidden layers
nnet_3 = models.Sequential()
nnet_3.add(layers.Dense(128, input_shape=(X_train.shape[1],), kernel_regularizer=regularizers.l1(0.005), activation='relu'))
nnet_3.add(layers.Dense(256, kernel_regularizer=regularizers.l1(0.005), activation='relu'))
nnet_3.add(layers.Dense(256, kernel_regularizer=regularizers.l1(0.005), activation='relu'))
nnet_3.add(layers.Dense(512, kernel_regularizer=regularizers.l1(0.005), activation='relu'))
nnet_3.add(layers.Dense(1, activation='linear'))

# Model compilation
nnet_3.compile(loss='mean_squared_error',
            optimizer='adam',
            metrics=['mean_squared_error'])
print(nnet_3.summary())
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_9 (Dense)              (None, 128)               3072      
_________________________________________________________________
dense_10 (Dense)             (None, 256)               33024     
_________________________________________________________________
dense_11 (Dense)             (None, 256)               65792     
_________________________________________________________________
dense_12 (Dense)             (None, 512)               131584    
_________________________________________________________________
dense_13 (Dense)             (None, 1)                 513       
=================================================================
Total params: 233,985
Trainable params: 233,985
Non-trainable params: 0
_________________________________________________________________
None
In [83]:
# Training the model
nnet_3_train = nnet_3.fit(X_train,
                  y_train,
                  epochs=150,
                  batch_size=256,
                  validation_split = 0.1)
Train on 984 samples, validate on 110 samples
Epoch 1/150
984/984 [==============================] - 0s 132us/step - loss: 80.3673 - mean_squared_error: 22.5474 - val_loss: 64.9817 - val_mean_squared_error: 8.2024
Epoch 2/150
984/984 [==============================] - 0s 17us/step - loss: 61.4236 - mean_squared_error: 5.2392 - val_loss: 59.3525 - val_mean_squared_error: 4.3286
Epoch 3/150
984/984 [==============================] - 0s 17us/step - loss: 58.7963 - mean_squared_error: 4.4872 - val_loss: 54.0432 - val_mean_squared_error: 1.0213
Epoch 4/150
984/984 [==============================] - 0s 18us/step - loss: 54.3594 - mean_squared_error: 2.0722 - val_loss: 53.3757 - val_mean_squared_error: 2.3955
Epoch 5/150
984/984 [==============================] - 0s 16us/step - loss: 52.4678 - mean_squared_error: 2.2147 - val_loss: 49.8242 - val_mean_squared_error: 0.8561
Epoch 6/150
984/984 [==============================] - 0s 17us/step - loss: 49.2103 - mean_squared_error: 0.9591 - val_loss: 47.8560 - val_mean_squared_error: 0.8842
Epoch 7/150
984/984 [==============================] - 0s 17us/step - loss: 47.4259 - mean_squared_error: 1.1807 - val_loss: 45.4633 - val_mean_squared_error: 0.5123
Epoch 8/150
984/984 [==============================] - 0s 16us/step - loss: 44.8703 - mean_squared_error: 0.6397 - val_loss: 43.5588 - val_mean_squared_error: 0.5987
Epoch 9/150
984/984 [==============================] - 0s 16us/step - loss: 43.0367 - mean_squared_error: 0.7752 - val_loss: 41.5274 - val_mean_squared_error: 0.4949
Epoch 10/150
984/984 [==============================] - 0s 16us/step - loss: 40.9011 - mean_squared_error: 0.5441 - val_loss: 39.5954 - val_mean_squared_error: 0.4296
Epoch 11/150
984/984 [==============================] - 0s 17us/step - loss: 39.0514 - mean_squared_error: 0.5459 - val_loss: 37.7096 - val_mean_squared_error: 0.3716
Epoch 12/150
984/984 [==============================] - 0s 17us/step - loss: 37.1454 - mean_squared_error: 0.4509 - val_loss: 35.9369 - val_mean_squared_error: 0.3723
Epoch 13/150
984/984 [==============================] - 0s 17us/step - loss: 35.3967 - mean_squared_error: 0.4474 - val_loss: 34.1820 - val_mean_squared_error: 0.3086
Epoch 14/150
984/984 [==============================] - 0s 17us/step - loss: 33.6631 - mean_squared_error: 0.3766 - val_loss: 32.5589 - val_mean_squared_error: 0.3041
Epoch 15/150
984/984 [==============================] - 0s 18us/step - loss: 32.0587 - mean_squared_error: 0.3685 - val_loss: 30.9753 - val_mean_squared_error: 0.2751
Epoch 16/150
984/984 [==============================] - 0s 16us/step - loss: 30.5013 - mean_squared_error: 0.3396 - val_loss: 29.5009 - val_mean_squared_error: 0.2813
Epoch 17/150
984/984 [==============================] - 0s 17us/step - loss: 29.0397 - mean_squared_error: 0.3282 - val_loss: 28.0900 - val_mean_squared_error: 0.2629
Epoch 18/150
984/984 [==============================] - 0s 19us/step - loss: 27.6653 - mean_squared_error: 0.3153 - val_loss: 26.7725 - val_mean_squared_error: 0.2562
Epoch 19/150
984/984 [==============================] - 0s 16us/step - loss: 26.3666 - mean_squared_error: 0.2988 - val_loss: 25.5390 - val_mean_squared_error: 0.2519
Epoch 20/150
984/984 [==============================] - 0s 17us/step - loss: 25.1615 - mean_squared_error: 0.2917 - val_loss: 24.3852 - val_mean_squared_error: 0.2385
Epoch 21/150
984/984 [==============================] - 0s 17us/step - loss: 24.0393 - mean_squared_error: 0.2783 - val_loss: 23.3295 - val_mean_squared_error: 0.2353
Epoch 22/150
984/984 [==============================] - 0s 15us/step - loss: 23.0126 - mean_squared_error: 0.2721 - val_loss: 22.3615 - val_mean_squared_error: 0.2304
Epoch 23/150
984/984 [==============================] - 0s 17us/step - loss: 22.0684 - mean_squared_error: 0.2626 - val_loss: 21.4636 - val_mean_squared_error: 0.2267
Epoch 24/150
984/984 [==============================] - 0s 17us/step - loss: 21.1852 - mean_squared_error: 0.2562 - val_loss: 20.6120 - val_mean_squared_error: 0.2221
Epoch 25/150
984/984 [==============================] - 0s 16us/step - loss: 20.3494 - mean_squared_error: 0.2506 - val_loss: 19.8124 - val_mean_squared_error: 0.2200
Epoch 26/150
984/984 [==============================] - 0s 19us/step - loss: 19.5656 - mean_squared_error: 0.2450 - val_loss: 19.0644 - val_mean_squared_error: 0.2156
Epoch 27/150
984/984 [==============================] - 0s 19us/step - loss: 18.8359 - mean_squared_error: 0.2397 - val_loss: 18.3670 - val_mean_squared_error: 0.2119
Epoch 28/150
984/984 [==============================] - 0s 18us/step - loss: 18.1518 - mean_squared_error: 0.2358 - val_loss: 17.7096 - val_mean_squared_error: 0.2107
Epoch 29/150
984/984 [==============================] - 0s 18us/step - loss: 17.5045 - mean_squared_error: 0.2311 - val_loss: 17.0868 - val_mean_squared_error: 0.2077
Epoch 30/150
984/984 [==============================] - 0s 18us/step - loss: 16.8909 - mean_squared_error: 0.2272 - val_loss: 16.4923 - val_mean_squared_error: 0.2048
Epoch 31/150
984/984 [==============================] - 0s 16us/step - loss: 16.3075 - mean_squared_error: 0.2234 - val_loss: 15.9325 - val_mean_squared_error: 0.2037
Epoch 32/150
984/984 [==============================] - 0s 18us/step - loss: 15.7567 - mean_squared_error: 0.2208 - val_loss: 15.4008 - val_mean_squared_error: 0.2020
Epoch 33/150
984/984 [==============================] - 0s 15us/step - loss: 15.2313 - mean_squared_error: 0.2170 - val_loss: 14.8924 - val_mean_squared_error: 0.2002
Epoch 34/150
984/984 [==============================] - 0s 19us/step - loss: 14.7303 - mean_squared_error: 0.2133 - val_loss: 14.4098 - val_mean_squared_error: 0.2004
Epoch 35/150
984/984 [==============================] - 0s 18us/step - loss: 14.2527 - mean_squared_error: 0.2106 - val_loss: 13.9469 - val_mean_squared_error: 0.1984
Epoch 36/150
984/984 [==============================] - 0s 16us/step - loss: 13.7979 - mean_squared_error: 0.2085 - val_loss: 13.5066 - val_mean_squared_error: 0.1971
Epoch 37/150
984/984 [==============================] - 0s 17us/step - loss: 13.3636 - mean_squared_error: 0.2069 - val_loss: 13.0857 - val_mean_squared_error: 0.1946
Epoch 38/150
984/984 [==============================] - 0s 16us/step - loss: 12.9485 - mean_squared_error: 0.2029 - val_loss: 12.6806 - val_mean_squared_error: 0.1934
Epoch 39/150
984/984 [==============================] - 0s 17us/step - loss: 12.5495 - mean_squared_error: 0.2027 - val_loss: 12.2963 - val_mean_squared_error: 0.1927
Epoch 40/150
984/984 [==============================] - 0s 17us/step - loss: 12.1679 - mean_squared_error: 0.1979 - val_loss: 11.9248 - val_mean_squared_error: 0.1923
Epoch 41/150
984/984 [==============================] - 0s 18us/step - loss: 11.8028 - mean_squared_error: 0.1991 - val_loss: 11.5718 - val_mean_squared_error: 0.1918
Epoch 42/150
984/984 [==============================] - 0s 17us/step - loss: 11.4497 - mean_squared_error: 0.1939 - val_loss: 11.2265 - val_mean_squared_error: 0.1898
Epoch 43/150
984/984 [==============================] - 0s 17us/step - loss: 11.1126 - mean_squared_error: 0.1943 - val_loss: 10.8977 - val_mean_squared_error: 0.1871
Epoch 44/150
984/984 [==============================] - 0s 18us/step - loss: 10.7870 - mean_squared_error: 0.1910 - val_loss: 10.5803 - val_mean_squared_error: 0.1864
Epoch 45/150
984/984 [==============================] - 0s 17us/step - loss: 10.4758 - mean_squared_error: 0.1910 - val_loss: 10.2786 - val_mean_squared_error: 0.1854
Epoch 46/150
984/984 [==============================] - 0s 18us/step - loss: 10.1754 - mean_squared_error: 0.1881 - val_loss: 9.9857 - val_mean_squared_error: 0.1845
Epoch 47/150
984/984 [==============================] - 0s 17us/step - loss: 9.8876 - mean_squared_error: 0.1882 - val_loss: 9.7077 - val_mean_squared_error: 0.1848
Epoch 48/150
984/984 [==============================] - 0s 18us/step - loss: 9.6106 - mean_squared_error: 0.1875 - val_loss: 9.4332 - val_mean_squared_error: 0.1832
Epoch 49/150
984/984 [==============================] - 0s 18us/step - loss: 9.3416 - mean_squared_error: 0.1847 - val_loss: 9.1704 - val_mean_squared_error: 0.1814
Epoch 50/150
984/984 [==============================] - 0s 19us/step - loss: 9.0838 - mean_squared_error: 0.1857 - val_loss: 8.9188 - val_mean_squared_error: 0.1810
Epoch 51/150
984/984 [==============================] - 0s 19us/step - loss: 8.8316 - mean_squared_error: 0.1826 - val_loss: 8.6724 - val_mean_squared_error: 0.1802
Epoch 52/150
984/984 [==============================] - 0s 18us/step - loss: 8.5905 - mean_squared_error: 0.1836 - val_loss: 8.4424 - val_mean_squared_error: 0.1814
Epoch 53/150
984/984 [==============================] - 0s 19us/step - loss: 8.3582 - mean_squared_error: 0.1806 - val_loss: 8.2096 - val_mean_squared_error: 0.1785
Epoch 54/150
984/984 [==============================] - 0s 17us/step - loss: 8.1314 - mean_squared_error: 0.1789 - val_loss: 7.9913 - val_mean_squared_error: 0.1778
Epoch 55/150
984/984 [==============================] - 0s 16us/step - loss: 7.9142 - mean_squared_error: 0.1787 - val_loss: 7.7766 - val_mean_squared_error: 0.1766
Epoch 56/150
984/984 [==============================] - 0s 16us/step - loss: 7.7044 - mean_squared_error: 0.1776 - val_loss: 7.5698 - val_mean_squared_error: 0.1759
Epoch 57/150
984/984 [==============================] - 0s 16us/step - loss: 7.5001 - mean_squared_error: 0.1780 - val_loss: 7.3721 - val_mean_squared_error: 0.1748
Epoch 58/150
984/984 [==============================] - 0s 16us/step - loss: 7.3036 - mean_squared_error: 0.1770 - val_loss: 7.1760 - val_mean_squared_error: 0.1733
Epoch 59/150
984/984 [==============================] - 0s 16us/step - loss: 7.1133 - mean_squared_error: 0.1756 - val_loss: 6.9901 - val_mean_squared_error: 0.1717
Epoch 60/150
984/984 [==============================] - 0s 15us/step - loss: 6.9316 - mean_squared_error: 0.1784 - val_loss: 6.8163 - val_mean_squared_error: 0.1745
Epoch 61/150
984/984 [==============================] - 0s 15us/step - loss: 6.7505 - mean_squared_error: 0.1736 - val_loss: 6.6343 - val_mean_squared_error: 0.1707
Epoch 62/150
984/984 [==============================] - 0s 16us/step - loss: 6.5773 - mean_squared_error: 0.1733 - val_loss: 6.4658 - val_mean_squared_error: 0.1693
Epoch 63/150
984/984 [==============================] - 0s 16us/step - loss: 6.4104 - mean_squared_error: 0.1734 - val_loss: 6.3019 - val_mean_squared_error: 0.1690
Epoch 64/150
984/984 [==============================] - 0s 15us/step - loss: 6.2482 - mean_squared_error: 0.1728 - val_loss: 6.1437 - val_mean_squared_error: 0.1687
Epoch 65/150
984/984 [==============================] - 0s 15us/step - loss: 6.0899 - mean_squared_error: 0.1704 - val_loss: 5.9875 - val_mean_squared_error: 0.1673
Epoch 66/150
984/984 [==============================] - 0s 17us/step - loss: 5.9393 - mean_squared_error: 0.1722 - val_loss: 5.8440 - val_mean_squared_error: 0.1692
Epoch 67/150
984/984 [==============================] - 0s 16us/step - loss: 5.7924 - mean_squared_error: 0.1718 - val_loss: 5.6956 - val_mean_squared_error: 0.1669
Epoch 68/150
984/984 [==============================] - 0s 16us/step - loss: 5.6514 - mean_squared_error: 0.1714 - val_loss: 5.5549 - val_mean_squared_error: 0.1669
Epoch 69/150
984/984 [==============================] - 0s 17us/step - loss: 5.5095 - mean_squared_error: 0.1703 - val_loss: 5.4317 - val_mean_squared_error: 0.1740
Epoch 70/150
984/984 [==============================] - 0s 22us/step - loss: 5.3771 - mean_squared_error: 0.1693 - val_loss: 5.2860 - val_mean_squared_error: 0.1659
Epoch 71/150
984/984 [==============================] - 0s 18us/step - loss: 5.2470 - mean_squared_error: 0.1706 - val_loss: 5.1600 - val_mean_squared_error: 0.1648
Epoch 72/150
984/984 [==============================] - 0s 17us/step - loss: 5.1182 - mean_squared_error: 0.1693 - val_loss: 5.0354 - val_mean_squared_error: 0.1649
Epoch 73/150
984/984 [==============================] - 0s 16us/step - loss: 4.9962 - mean_squared_error: 0.1682 - val_loss: 4.9130 - val_mean_squared_error: 0.1643
Epoch 74/150
984/984 [==============================] - 0s 16us/step - loss: 4.8759 - mean_squared_error: 0.1695 - val_loss: 4.8067 - val_mean_squared_error: 0.1701
Epoch 75/150
984/984 [==============================] - 0s 17us/step - loss: 4.7613 - mean_squared_error: 0.1675 - val_loss: 4.6808 - val_mean_squared_error: 0.1632
Epoch 76/150
984/984 [==============================] - 0s 17us/step - loss: 4.6477 - mean_squared_error: 0.1684 - val_loss: 4.5717 - val_mean_squared_error: 0.1627
Epoch 77/150
984/984 [==============================] - 0s 17us/step - loss: 4.5361 - mean_squared_error: 0.1668 - val_loss: 4.4611 - val_mean_squared_error: 0.1614
Epoch 78/150
984/984 [==============================] - 0s 18us/step - loss: 4.4297 - mean_squared_error: 0.1675 - val_loss: 4.3677 - val_mean_squared_error: 0.1677
Epoch 79/150
984/984 [==============================] - 0s 16us/step - loss: 4.3335 - mean_squared_error: 0.1715 - val_loss: 4.2554 - val_mean_squared_error: 0.1610
Epoch 80/150
984/984 [==============================] - 0s 16us/step - loss: 4.2268 - mean_squared_error: 0.1657 - val_loss: 4.1622 - val_mean_squared_error: 0.1627
Epoch 81/150
984/984 [==============================] - 0s 16us/step - loss: 4.1307 - mean_squared_error: 0.1673 - val_loss: 4.0617 - val_mean_squared_error: 0.1598
Epoch 82/150
984/984 [==============================] - 0s 18us/step - loss: 4.0355 - mean_squared_error: 0.1676 - val_loss: 3.9793 - val_mean_squared_error: 0.1664
Epoch 83/150
984/984 [==============================] - 0s 19us/step - loss: 3.9462 - mean_squared_error: 0.1667 - val_loss: 3.8768 - val_mean_squared_error: 0.1583
Epoch 84/150
984/984 [==============================] - 0s 18us/step - loss: 3.8549 - mean_squared_error: 0.1670 - val_loss: 3.7960 - val_mean_squared_error: 0.1617
Epoch 85/150
984/984 [==============================] - 0s 18us/step - loss: 3.7697 - mean_squared_error: 0.1676 - val_loss: 3.7017 - val_mean_squared_error: 0.1568
Epoch 86/150
984/984 [==============================] - 0s 17us/step - loss: 3.6849 - mean_squared_error: 0.1673 - val_loss: 3.6221 - val_mean_squared_error: 0.1576
Epoch 87/150
984/984 [==============================] - 0s 17us/step - loss: 3.6010 - mean_squared_error: 0.1667 - val_loss: 3.5415 - val_mean_squared_error: 0.1579
Epoch 88/150
984/984 [==============================] - 0s 17us/step - loss: 3.5213 - mean_squared_error: 0.1661 - val_loss: 3.4622 - val_mean_squared_error: 0.1572
Epoch 89/150
984/984 [==============================] - 0s 17us/step - loss: 3.4437 - mean_squared_error: 0.1663 - val_loss: 3.3849 - val_mean_squared_error: 0.1567
Epoch 90/150
984/984 [==============================] - 0s 19us/step - loss: 3.3673 - mean_squared_error: 0.1661 - val_loss: 3.3101 - val_mean_squared_error: 0.1564
Epoch 91/150
984/984 [==============================] - 0s 19us/step - loss: 3.2948 - mean_squared_error: 0.1670 - val_loss: 3.2368 - val_mean_squared_error: 0.1553
Epoch 92/150
984/984 [==============================] - 0s 17us/step - loss: 3.2231 - mean_squared_error: 0.1674 - val_loss: 3.1673 - val_mean_squared_error: 0.1559
Epoch 93/150
984/984 [==============================] - 0s 17us/step - loss: 3.1538 - mean_squared_error: 0.1677 - val_loss: 3.0964 - val_mean_squared_error: 0.1545
Epoch 94/150
984/984 [==============================] - 0s 17us/step - loss: 3.0859 - mean_squared_error: 0.1670 - val_loss: 3.0288 - val_mean_squared_error: 0.1537
Epoch 95/150
984/984 [==============================] - 0s 17us/step - loss: 3.0200 - mean_squared_error: 0.1678 - val_loss: 2.9650 - val_mean_squared_error: 0.1543
Epoch 96/150
984/984 [==============================] - 0s 17us/step - loss: 2.9549 - mean_squared_error: 0.1682 - val_loss: 2.9023 - val_mean_squared_error: 0.1555
Epoch 97/150
984/984 [==============================] - 0s 17us/step - loss: 2.8915 - mean_squared_error: 0.1674 - val_loss: 2.8376 - val_mean_squared_error: 0.1539
Epoch 98/150
984/984 [==============================] - 0s 17us/step - loss: 2.8304 - mean_squared_error: 0.1689 - val_loss: 2.7862 - val_mean_squared_error: 0.1600
Epoch 99/150
984/984 [==============================] - 0s 17us/step - loss: 2.7735 - mean_squared_error: 0.1693 - val_loss: 2.7179 - val_mean_squared_error: 0.1536
Epoch 100/150
984/984 [==============================] - 0s 17us/step - loss: 2.7129 - mean_squared_error: 0.1690 - val_loss: 2.6632 - val_mean_squared_error: 0.1551
Epoch 101/150
984/984 [==============================] - 0s 16us/step - loss: 2.6560 - mean_squared_error: 0.1694 - val_loss: 2.6052 - val_mean_squared_error: 0.1542
Epoch 102/150
984/984 [==============================] - 0s 18us/step - loss: 2.6001 - mean_squared_error: 0.1687 - val_loss: 2.5525 - val_mean_squared_error: 0.1554
Epoch 103/150
984/984 [==============================] - 0s 17us/step - loss: 2.5462 - mean_squared_error: 0.1685 - val_loss: 2.4946 - val_mean_squared_error: 0.1523
Epoch 104/150
984/984 [==============================] - 0s 17us/step - loss: 2.4936 - mean_squared_error: 0.1700 - val_loss: 2.4453 - val_mean_squared_error: 0.1546
Epoch 105/150
984/984 [==============================] - 0s 17us/step - loss: 2.4420 - mean_squared_error: 0.1693 - val_loss: 2.3913 - val_mean_squared_error: 0.1521
Epoch 106/150
984/984 [==============================] - 0s 18us/step - loss: 2.3931 - mean_squared_error: 0.1719 - val_loss: 2.3509 - val_mean_squared_error: 0.1581
Epoch 107/150
984/984 [==============================] - 0s 18us/step - loss: 2.3454 - mean_squared_error: 0.1705 - val_loss: 2.2939 - val_mean_squared_error: 0.1522
Epoch 108/150
984/984 [==============================] - 0s 19us/step - loss: 2.2968 - mean_squared_error: 0.1718 - val_loss: 2.2550 - val_mean_squared_error: 0.1578
Epoch 109/150
984/984 [==============================] - 0s 18us/step - loss: 2.2509 - mean_squared_error: 0.1712 - val_loss: 2.1998 - val_mean_squared_error: 0.1512
Epoch 110/150
984/984 [==============================] - 0s 17us/step - loss: 2.2063 - mean_squared_error: 0.1725 - val_loss: 2.1611 - val_mean_squared_error: 0.1551
Epoch 111/150
984/984 [==============================] - 0s 17us/step - loss: 2.1610 - mean_squared_error: 0.1719 - val_loss: 2.1136 - val_mean_squared_error: 0.1528
Epoch 112/150
984/984 [==============================] - 0s 16us/step - loss: 2.1225 - mean_squared_error: 0.1749 - val_loss: 2.0712 - val_mean_squared_error: 0.1516
Epoch 113/150
984/984 [==============================] - 0s 17us/step - loss: 2.0807 - mean_squared_error: 0.1764 - val_loss: 2.0363 - val_mean_squared_error: 0.1563
Epoch 114/150
984/984 [==============================] - 0s 17us/step - loss: 2.0377 - mean_squared_error: 0.1732 - val_loss: 1.9907 - val_mean_squared_error: 0.1535
Epoch 115/150
984/984 [==============================] - 0s 17us/step - loss: 1.9966 - mean_squared_error: 0.1731 - val_loss: 1.9518 - val_mean_squared_error: 0.1536
Epoch 116/150
984/984 [==============================] - 0s 17us/step - loss: 1.9588 - mean_squared_error: 0.1747 - val_loss: 1.9172 - val_mean_squared_error: 0.1559
Epoch 117/150
984/984 [==============================] - 0s 17us/step - loss: 1.9228 - mean_squared_error: 0.1745 - val_loss: 1.8751 - val_mean_squared_error: 0.1520
Epoch 118/150
984/984 [==============================] - 0s 17us/step - loss: 1.8853 - mean_squared_error: 0.1759 - val_loss: 1.8449 - val_mean_squared_error: 0.1568
Epoch 119/150
984/984 [==============================] - 0s 18us/step - loss: 1.8503 - mean_squared_error: 0.1749 - val_loss: 1.8048 - val_mean_squared_error: 0.1536
Epoch 120/150
984/984 [==============================] - 0s 16us/step - loss: 1.8151 - mean_squared_error: 0.1762 - val_loss: 1.7698 - val_mean_squared_error: 0.1532
Epoch 121/150
984/984 [==============================] - 0s 19us/step - loss: 1.7804 - mean_squared_error: 0.1763 - val_loss: 1.7354 - val_mean_squared_error: 0.1531
Epoch 122/150
984/984 [==============================] - 0s 17us/step - loss: 1.7468 - mean_squared_error: 0.1763 - val_loss: 1.7035 - val_mean_squared_error: 0.1539
Epoch 123/150
984/984 [==============================] - 0s 18us/step - loss: 1.7146 - mean_squared_error: 0.1767 - val_loss: 1.6724 - val_mean_squared_error: 0.1547
Epoch 124/150
984/984 [==============================] - 0s 17us/step - loss: 1.6835 - mean_squared_error: 0.1773 - val_loss: 1.6402 - val_mean_squared_error: 0.1541
Epoch 125/150
984/984 [==============================] - 0s 17us/step - loss: 1.6528 - mean_squared_error: 0.1776 - val_loss: 1.6087 - val_mean_squared_error: 0.1534
Epoch 126/150
984/984 [==============================] - 0s 17us/step - loss: 1.6228 - mean_squared_error: 0.1788 - val_loss: 1.5802 - val_mean_squared_error: 0.1542
Epoch 127/150
984/984 [==============================] - 0s 16us/step - loss: 1.5942 - mean_squared_error: 0.1784 - val_loss: 1.5502 - val_mean_squared_error: 0.1541
Epoch 128/150
984/984 [==============================] - 0s 17us/step - loss: 1.5651 - mean_squared_error: 0.1794 - val_loss: 1.5236 - val_mean_squared_error: 0.1554
Epoch 129/150
984/984 [==============================] - 0s 16us/step - loss: 1.5373 - mean_squared_error: 0.1792 - val_loss: 1.4961 - val_mean_squared_error: 0.1558
Epoch 130/150
984/984 [==============================] - 0s 16us/step - loss: 1.5105 - mean_squared_error: 0.1796 - val_loss: 1.4665 - val_mean_squared_error: 0.1534
Epoch 131/150
984/984 [==============================] - 0s 16us/step - loss: 1.4843 - mean_squared_error: 0.1809 - val_loss: 1.4426 - val_mean_squared_error: 0.1556
Epoch 132/150
984/984 [==============================] - 0s 15us/step - loss: 1.4582 - mean_squared_error: 0.1805 - val_loss: 1.4153 - val_mean_squared_error: 0.1547
Epoch 133/150
984/984 [==============================] - 0s 15us/step - loss: 1.4328 - mean_squared_error: 0.1814 - val_loss: 1.3905 - val_mean_squared_error: 0.1551
Epoch 134/150
984/984 [==============================] - 0s 16us/step - loss: 1.4089 - mean_squared_error: 0.1821 - val_loss: 1.3653 - val_mean_squared_error: 0.1544
Epoch 135/150
984/984 [==============================] - 0s 16us/step - loss: 1.3845 - mean_squared_error: 0.1821 - val_loss: 1.3428 - val_mean_squared_error: 0.1556
Epoch 136/150
984/984 [==============================] - 0s 15us/step - loss: 1.3615 - mean_squared_error: 0.1828 - val_loss: 1.3194 - val_mean_squared_error: 0.1554
Epoch 137/150
984/984 [==============================] - 0s 15us/step - loss: 1.3392 - mean_squared_error: 0.1831 - val_loss: 1.2957 - val_mean_squared_error: 0.1543
Epoch 138/150
984/984 [==============================] - 0s 16us/step - loss: 1.3180 - mean_squared_error: 0.1849 - val_loss: 1.2771 - val_mean_squared_error: 0.1569
Epoch 139/150
984/984 [==============================] - 0s 16us/step - loss: 1.2964 - mean_squared_error: 0.1837 - val_loss: 1.2531 - val_mean_squared_error: 0.1552
Epoch 140/150
984/984 [==============================] - 0s 16us/step - loss: 1.2748 - mean_squared_error: 0.1846 - val_loss: 1.2316 - val_mean_squared_error: 0.1549
Epoch 141/150
984/984 [==============================] - 0s 18us/step - loss: 1.2547 - mean_squared_error: 0.1850 - val_loss: 1.2112 - val_mean_squared_error: 0.1549
Epoch 142/150
984/984 [==============================] - 0s 18us/step - loss: 1.2343 - mean_squared_error: 0.1855 - val_loss: 1.1915 - val_mean_squared_error: 0.1555
Epoch 143/150
984/984 [==============================] - 0s 17us/step - loss: 1.2147 - mean_squared_error: 0.1856 - val_loss: 1.1718 - val_mean_squared_error: 0.1552
Epoch 144/150
984/984 [==============================] - 0s 16us/step - loss: 1.1962 - mean_squared_error: 0.1867 - val_loss: 1.1539 - val_mean_squared_error: 0.1560
Epoch 145/150
984/984 [==============================] - 0s 16us/step - loss: 1.1780 - mean_squared_error: 0.1870 - val_loss: 1.1336 - val_mean_squared_error: 0.1551
Epoch 146/150
984/984 [==============================] - 0s 20us/step - loss: 1.1588 - mean_squared_error: 0.1865 - val_loss: 1.1176 - val_mean_squared_error: 0.1568
Epoch 147/150
984/984 [==============================] - 0s 21us/step - loss: 1.1411 - mean_squared_error: 0.1869 - val_loss: 1.0980 - val_mean_squared_error: 0.1553
Epoch 148/150
984/984 [==============================] - 0s 15us/step - loss: 1.1245 - mean_squared_error: 0.1876 - val_loss: 1.0802 - val_mean_squared_error: 0.1547
Epoch 149/150
984/984 [==============================] - 0s 16us/step - loss: 1.1083 - mean_squared_error: 0.1890 - val_loss: 1.0653 - val_mean_squared_error: 0.1562
Epoch 150/150
984/984 [==============================] - 0s 16us/step - loss: 1.0917 - mean_squared_error: 0.1882 - val_loss: 1.0486 - val_mean_squared_error: 0.1561
In [84]:
# MSE and r squared values
nnet_3_test_pred = nnet_3.predict(X_test)
nnet_3_train_pred = nnet_3.predict(X_train)
print("Training MSE:", round(mean_squared_error(y_train, nnet_3_train_pred),4))
print("Validation MSE:", round(mean_squared_error(y_test, nnet_3_test_pred),4))
Training MSE: 0.185
Validation MSE: 0.2152

Major Insights and Findings

From a low-risk and high-return on investment standpoint the zipcodes 10025 and 10036 in Mahattan are good investment options

From a low-risk and moderate-return standpoint, I would recommend the zipcode 11434 in Queens and the zipcodes 10306 and 10305 in Staten Island as potentially good avenues of investment

From a breakeven perspective, I would recommend zipcode 11434 in Queens as the best ZipCode to invest in, followed by 10306 and 10303 in Staten Islands

From a competitive analysis perspective, it is recommeneded the client should exercise caution while investing in Queens or Staten Islands as these neighbourhoods have a significant proportion of listings that provide high quality stay to its customers

Its observed that out of the models tested the Neural net model with three hidden layers is by far the best one. It has the lowest test MSE than the Simpler Linear Regression, Random Forest, XGBoost Model and the neural net configuration with two hidden layers